Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 13, Number 3 (2019), 1884-1926.
Approximate inference for constructing astronomical catalogs from images
Jeffrey Regier, Andrew C. Miller, David Schlegel, Ryan P. Adams, Jon D. McAuliffe, and Prabhat
Full-text: Open access
Abstract
We present a new, fully generative model for constructing astronomical catalogs from optical telescope image sets. Each pixel intensity is treated as a random variable with parameters that depend on the latent properties of stars and galaxies. These latent properties are themselves modeled as random. We compare two procedures for posterior inference. One procedure is based on Markov chain Monte Carlo (MCMC) while the other is based on variational inference (VI). The MCMC procedure excels at quantifying uncertainty, while the VI procedure is 1000 times faster. On a supercomputer, the VI procedure efficiently uses 665,000 CPU cores to construct an astronomical catalog from 50 terabytes of images in 14.6 minutes, demonstrating the scaling characteristics necessary to construct catalogs for upcoming astronomical surveys.
Article information
Source
Ann. Appl. Stat., Volume 13, Number 3 (2019), 1884-1926.
Dates
Received: February 2018
Revised: April 2019
First available in Project Euclid: 17 October 2019
Permanent link to this document
https://projecteuclid.org/euclid.aoas/1571277777
Digital Object Identifier
doi:10.1214/19-AOAS1258
Mathematical Reviews number (MathSciNet)
MR4019161
Zentralblatt MATH identifier
07145979
Keywords
Astronomy graphical model MCMC variational inference high performance computing
Citation
Regier, Jeffrey; Miller, Andrew C.; Schlegel, David; Adams, Ryan P.; McAuliffe, Jon D.; Prabhat. Approximate inference for constructing astronomical catalogs from images. Ann. Appl. Stat. 13 (2019), no. 3, 1884--1926. doi:10.1214/19-AOAS1258. https://projecteuclid.org/euclid.aoas/1571277777
References
- Barbary, K. (2016). SEP: Source extractor as a library. The Journal of Open Source Software 1 59.
- Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis, 2nd ed. Springer Series in Statistics. Springer, New York.Zentralblatt MATH: 0572.62008
- Bertin, E. and Arnouts, S. (1996). SExtractor: Software for source extraction. Astron. Astrophys. Suppl. Ser. 117 393–404.
- Bezanson, J., Edelman, A., Karpinski, S. and Shah, V. B. (2017). Julia: A fresh approach to numerical computing. SIAM Rev. 59 65–98.
- Bickel, P. J. and Doksum, K. A. (2016). Mathematical Statistics—Basic Ideas and Selected Topics. Vol. 2, 2nd ed. Texts in Statistical Science Series. CRC Press, Boca Raton, FL.
- Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York.Zentralblatt MATH: 1107.68072
- Blei, D. M., Kucukelbir, A. and McAuliffe, J. D. (2017). Variational inference: A review for statisticians. J. Amer. Statist. Assoc. 112 859–877.
- Booth, J. G. and Sarkar, S. (1998). Monte Carlo approximation of bootstrap variances. Amer. Statist. 52 354–357.
- Bosch, J., Armstrong, R., Bickerton, S. et al. (2018). The Hyper Suprime-Cam software pipeline. Publ. Astron. Soc. Jpn. 70.
- Brewer, B. J., Foreman-Mackey, D. and Hogg, D. W. (2013). Probabilistic catalogs for crowded stellar fields. Astron. J. 146.
- Bubeck, S. (2015). Convex optimization: Algorithms and complexity. Found. Trends Mach. Learn. 8.
- Doi, M., Tanaka, M. et al. (2010). Photometric response functions of the Sloan Digital Sky Survey imager. Astron. J. 139.
- Eisenstein, D. J., Annis, J., Gunn, J. E. et al. (2001). Spectroscopic target selection for the Sloan Digital Sky Survey: The Luminous red galaxy sample. Astron. J. 122 2267.
- Fan, Y. and Sisson, S. A. (2011). Reversible Jump Markov Chain Monte Carlo. In Handbook of Markov Chain Monte Carlo.
- Feigelson, E. D. and Babu, G. J. (2012). Modern Statistical Methods for Astronomy: With R Applications. Cambridge Univ. Press, Cambridge.
- Feldman, M. (2018). Summit up and running at Oak Ridge. Available at https://www.top500.org/news/summit-up-and-running-at-oak-ridge-claims-first-exascale-application/. [Online; accessed August 20, 2018].
- Folk, M., Heber, G., Koziol, Q., Pourmal, E. and Robinson, D. (2011). An overview of the HDF5 technology suite and its applications. In Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases 36–47.
- Fraysse, A. and Rodet, T. (2014). A measure-theoretic variational Bayesian algorithm for large dimensional problems. SIAM J. Imaging Sci. 7 2591–2622.
- Friel, N. and Wyse, J. (2012). Estimating the evidence—a review. Stat. Neerl. 66 288–308.
- Fukugita, M., Ichikawa, T., Gunn, J. E., Doi, M., Shimasaku, K. and Schneider, D. P. (1996). The Sloan Digital Sky Survey photometric system. Astron. J. 111.
- Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statist. Sci. 7 457–472.Zentralblatt MATH: 1386.65060
Digital Object Identifier: doi:10.1214/ss/1177011136
Project Euclid: euclid.ss/1177011136 - Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. and Rubin, D. B. (2014). Bayesian Data Analysis, 3rd ed. Texts in Statistical Science Series. CRC Press, Boca Raton, FL.Zentralblatt MATH: 1279.62004
- Giordano, R. J., Broderick, T. and Jordan, M. I. (2015). Linear response methods for accurate covariance estimates from mean field variational Bayes. In Advances in Neural Information Processing Systems 1441–1449.
- Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711–732.
- Hager, G. and Wellein, G. (2010). Introduction to High Performance Computing for Scientists and Engineers. CRC Press, Boca Raton, FL.
- Hershey, J. R. and Olsen, P. A. (2007). Approximating the Kullback Leibler divergence between Gaussian mixture models. In IEEE International Conference on Acoustics, Speech and Signal Processing 4 317–320.
- Julia developers (2018). Julia micro-benchmarks. Available at https://julialang.org/benchmarks/. [Online; accessed January 23, 2018].
- Lang, D., Hogg, D. W. and Mykytyn, D. (2016). The Tractor: Probabilistic astronomical source detection and measurement. Astrophysics Source Code Library.
- LSST (2017). About LSST. Available at http://www.lsst.org/about. [Online; accessed September 12, 2017].
- Lupton, R. H., Ivezic, Z. et al. (2005). SDSS image processing II: The photo pipelines. Technical Report, Princeton Univ. Preprint. Available at https://www.astro.princeton.edu/~rhl/photo-lite.pdf.
- Lupton, R., Gunn, J. E., Ivezić, Z., Knapp, G. R. and Kent, S. (2001). The SDSS imaging pipelines. In Astronomical Data Analysis Software and Systems X (F. R. Harnden, Jr., F. A. Primini and H. E. Payne, eds.). Astronomical Society of the Pacific Conference Series 238 269–280.
- MacKay, D. J. (1995). Developments in probabilistic modelling with neural networks—ensemble learning. In Neural Networks: Artificial Intelligence and Industrial Applications. Springer, Berlin.
- Melchior, P., Moolekamp, F., Jerdee, M., Armstrong, R., Sun, A.-L., Bosch, J. and Lupton, R. (2018). SCARLET: Source separation in multi-band images by constrained matrix factorization. Available at arXiv:1802.10157.arXiv: 1802.10157
- Morgan, T. (2018). The end of Xeon Phi. Available at https://www.nextplatform.com/2018/07/27/end-of-the-line-for-xeon-phi-its-all-xeon-from-here/. [Online; accessed August 20, 2018].
- Mujtaba, H. (2018). Intel Xeon Scalable Family Roadmap Revealed. Available at https://wccftech.com/intel-xeon-scalable-family-roadmap-revealed-points-out-cascade-lake-sp-in-q4-2018-cooper-lake-sp-in-q4-2019-ice-lake-sp-in-1h-2020/. [Online; accessed August 20, 2018].
- Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge, MA.Zentralblatt MATH: 1295.68003
- Neal, R. M. (2001). Annealed importance sampling. Stat. Comput. 11 125–139.
- Neal, R. M. (2003). Slice sampling. Ann. Statist. 31 705–767.Zentralblatt MATH: 1051.65007
Digital Object Identifier: doi:10.1214/aos/1056562461
Project Euclid: euclid.aos/1056562461 - NERSC (2018). Cori configuration. Available at http://www.nersc.gov/users/computational-systems/cori/configuration/. [Online; accessed January 23, 2018].
- Nocedal, J. and Wright, S. J. (2006). Numerical Optimization, 2nd ed. Springer Series in Operations Research and Financial Engineering. Springer, New York.Zentralblatt MATH: 1104.65059
- Portillo, S. K. N., Lee, B. C. G., Daylan, T. and Finkbeiner, D. P. (2017). Improved point-source detection in crowded fields using probabilistic cataloging. Astron. J. 154.
- Price, D. C., Barsdell, B. R. and Greenhill, L. J. (2015). HDFITS: Porting the FITS data model to HDF5. Astron. Comput. 12.
- Regier, J., McAuliffe, J. and Prabhat (2015). A deep generative model for astronomical images of galaxies. In NIPS Workshop: Advances in Approximate Bayesian Inference.
- Regier, J., Miller, A. C., Schlegel, D., Adams, R. P., McAuliffe, J. D. and Prabhat (2019). Supplement to “Approximate inference for constructing astronomical catalogs from images.” DOI:10.1214/19-AOAS1258SUPP.
- Richards, G. T., Fan, X., Newberg, H. J., Strauss, M. A. et al. (2002). Spectroscopic target selection in the Sloan Digital Sky Survey: The quasar sample. Astron. J. 123 2945.
- Robert, C. P. and Casella, G. (1999). Monte Carlo Statistical Methods. Springer Texts in Statistics. Springer, New York.Zentralblatt MATH: 0935.62005
- Rowe, B., Jarvis, M., Mandelbaum, R. et al. (2015). GALSIM: The modular galaxy image simulation toolkit. Astron. Comput. 10.
- Schmidt, M. N., Winther, O. and Hansen, L. K. (2009). Bayesian non-negative matrix factorization. In International Conference on Independent Component Analysis and Signal Separation 540–547.
- SDSS (2017). Measures of flux and magnitude. Available at http://www.sdss3.org/dr8/algorithms/magnitudes.php. [Online; accessed November 12, 2017].
- SDSS (2018a). Glossary of SDSS-IV terminology. Available at https://www.sdss.org/dr14/help/glossary/#N. [Online; accessed August 21, 2018].
- SDSS (2018b). Camera. Available at http://www.sdss.org/instruments/camera/. [Online; accessed January 30, 2018].
- SDSS (2018c). Sky coverage. Available at http://classic.sdss.org/dr7/coverage/. [Online; accessed January 30, 2018].
- SDSS (2018d). Data model: Frame. https://data.sdss.org/datamodel/files/BOSS_PHOTOOBJ/frames/RERUN/RUN/CAMCOL/frame.html. [Online; accessed August 21, 2018].
- Skilling, J. (2004). Nested sampling. In Bayesian Inference and Maximum Entropy Methods in Science and Engineering. AIP Conf. Proc. 735 395–405. Amer. Inst. Phys., Melville, NY.
- Šmídl, V. and Quinn, A. (2008). Variational Bayesian filtering. IEEE Trans. Signal Process. 56 5020–5030.
- Smith, R. (2018). AMD reaffirms 7nm Epyc Rome server processors sampling in 2H 2018. Available at https://www.anandtech.com/show/13122/amd-rome-epyc-cpus-to-be-fabbed-by-tsmc. [Online; accessed August 20, 2018].
- Strauss, M. A., Weinberg, D. H., Lupton, R. H. et al. (2002). Spectroscopic target selection in the Sloan Digital Sky Survey: The main galaxy sample. Astron. J. 124 1810.
- Top500.org (2017). Top500 List—November 2017. Available at https://www.top500.org/list/2017/11/. [Online; accessed November 16, 2017].
- Turon, C., Meynadier, F., Arenou, F., Hogg, D. and Lang, D. (2010). Telescopes Don’t Make Catalogues! European Astronomical Society Publications Series 45.
- van Leeuwen, D. (2018). GaussianMixtures.jl. Available at https://github.com/davidavdav/GaussianMixtures.jl. [Online; accessed August 21, 2018].
- Wells, D. C. and Greisen, E. W. (1979). FITS—a flexible image transport system. In Image Processing in Astronomy 445–471.
- Wilson, H. (2009). M33—“The Galaxy in Triangulum.” Available at https://hwilson.zenfolio.com/galaxies/h274E3B43#h274e3b43.
- York, D. G., Adelman, J., Anderson, J., Anderson, S. F., Annis, J., Bahcall, N. A., Bakken, J. A. et al. (2000). The Sloan Digital Sky Survey: Technical summary. Astron. J. 120 1579–1587.
- Zheng, Y., Fraysse, A. and Rodet, T. (2015). Efficient variational Bayesian approximation method based on subspace optimization. IEEE Trans. Image Process. 24 681–693.
Supplemental materials
- Supplement: Kullback-Leibler divergences. Formulas for KL divergences between common distributions that appear in the derivation of the variational lower bound.Digital Object Identifier: doi:10.1214/19-AOAS1258SUPP

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Deconvolution in high-energy astrophysics: science, instrumentation, and
methods
Connors, Alanna, Esch, David N., Freeman, Peter, Kang, Hosung, Karovska, Margarita, Kashyap, Vinay, Siemiginowska, Aneta, van Dyk, David, and Zezas, Andreas, Bayesian Analysis, 2006 - Gravitational Lensing Accuracy Testing 2010 (GREAT10)
Challenge Handbook
Kitching, Thomas, Balan, Sreekumar, Bernstein, Gary, Bethge, Matthias, Bridle, Sarah, Courbin, Frederic, Gentile, Marc, Heavens, Alan, Hirsch, Michael, Hosseini, Reshad, Kiessling, Alina, Amara, Adam, Kirk, Donnacha, Kuijken, Konrad, Mandelbaum, Rachel, Moghaddam, Baback, Nurbaeva, Guldariya, Paulin-Henriksson, Stephane, Rassat, Anais, Rhodes, Jason, Schölkopf, Bernhard, Shawe-Taylor, John, Gill, Mandeep, Shmakova, Marina, Taylor, Andy, Velander, Malin, van Waerbeke, Ludovic, Witherick, Dugan, Wittman, David, Harmeling, Stefan, Heymans, Catherine, Massey, Richard, Rowe, Barnaby, Schrabback, Tim, and Voigt, Lisa, Annals of Applied Statistics, 2011 - Robust and rate-optimal Gibbs posterior inference on the boundary of a noisy image
Syring, Nicholas and Martin, Ryan, Annals of Statistics, 2020
- Deconvolution in high-energy astrophysics: science, instrumentation, and
methods
Connors, Alanna, Esch, David N., Freeman, Peter, Kang, Hosung, Karovska, Margarita, Kashyap, Vinay, Siemiginowska, Aneta, van Dyk, David, and Zezas, Andreas, Bayesian Analysis, 2006 - Gravitational Lensing Accuracy Testing 2010 (GREAT10)
Challenge Handbook
Kitching, Thomas, Balan, Sreekumar, Bernstein, Gary, Bethge, Matthias, Bridle, Sarah, Courbin, Frederic, Gentile, Marc, Heavens, Alan, Hirsch, Michael, Hosseini, Reshad, Kiessling, Alina, Amara, Adam, Kirk, Donnacha, Kuijken, Konrad, Mandelbaum, Rachel, Moghaddam, Baback, Nurbaeva, Guldariya, Paulin-Henriksson, Stephane, Rassat, Anais, Rhodes, Jason, Schölkopf, Bernhard, Shawe-Taylor, John, Gill, Mandeep, Shmakova, Marina, Taylor, Andy, Velander, Malin, van Waerbeke, Ludovic, Witherick, Dugan, Wittman, David, Harmeling, Stefan, Heymans, Catherine, Massey, Richard, Rowe, Barnaby, Schrabback, Tim, and Voigt, Lisa, Annals of Applied Statistics, 2011 - Robust and rate-optimal Gibbs posterior inference on the boundary of a noisy image
Syring, Nicholas and Martin, Ryan, Annals of Statistics, 2020 - Finding the Most Distant Quasars Using Bayesian Selection Methods
Mortlock, Daniel, Statistical Science, 2014 - Bayesian clustering of replicated time-course gene expression data with weak signals
Fu, Audrey Qiuyan, Russell, Steven, Bray, Sarah J., and Tavaré, Simon, Annals of Applied Statistics, 2013 - On image segmentation using information theoretic criteria
Aue, Alexander and Lee, Thomas C. M., Annals of Statistics, 2011 - Some Statistical and Computational Challenges, and Opportunities in Astronomy
Babu, G. Jogesh and Djorgovski, S. George, Statistical Science, 2004 - Bayesian Inference, Model Selection and Likelihood Estimation using Fast Rejection Sampling: The Conway-Maxwell-Poisson Distribution
Benson, Alan and Friel, Nial, Bayesian Analysis, 2021 - Estimating the $K$ function of a point process with an
application to cosmology
Loh, Ji Meng, Quashnock, Jean M., and Stein, Michael L., Annals of Statistics, 2000 - Maximum likelihood estimation of cloud height
from multi-angle satellite imagery
Anderes, E., Yu, B., Jovanovic, V., Moroney, C., Garay, M., Braverman, A., and Clothiaux, E., Annals of Applied Statistics, 2009