Annals of Applied Statistics

Approximate inference for constructing astronomical catalogs from images

Jeffrey Regier, Andrew C. Miller, David Schlegel, Ryan P. Adams, Jon D. McAuliffe, and Prabhat

Full-text: Open access

Abstract

We present a new, fully generative model for constructing astronomical catalogs from optical telescope image sets. Each pixel intensity is treated as a random variable with parameters that depend on the latent properties of stars and galaxies. These latent properties are themselves modeled as random. We compare two procedures for posterior inference. One procedure is based on Markov chain Monte Carlo (MCMC) while the other is based on variational inference (VI). The MCMC procedure excels at quantifying uncertainty, while the VI procedure is 1000 times faster. On a supercomputer, the VI procedure efficiently uses 665,000 CPU cores to construct an astronomical catalog from 50 terabytes of images in 14.6 minutes, demonstrating the scaling characteristics necessary to construct catalogs for upcoming astronomical surveys.

Article information

Source
Ann. Appl. Stat., Volume 13, Number 3 (2019), 1884-1926.

Dates
Received: February 2018
Revised: April 2019
First available in Project Euclid: 17 October 2019

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1571277777

Digital Object Identifier
doi:10.1214/19-AOAS1258

Mathematical Reviews number (MathSciNet)
MR4019161

Zentralblatt MATH identifier
07145979

Keywords
Astronomy graphical model MCMC variational inference high performance computing

Citation

Regier, Jeffrey; Miller, Andrew C.; Schlegel, David; Adams, Ryan P.; McAuliffe, Jon D.; Prabhat. Approximate inference for constructing astronomical catalogs from images. Ann. Appl. Stat. 13 (2019), no. 3, 1884--1926. doi:10.1214/19-AOAS1258. https://projecteuclid.org/euclid.aoas/1571277777


Export citation

References

  • Barbary, K. (2016). SEP: Source extractor as a library. The Journal of Open Source Software 1 59.
  • Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis, 2nd ed. Springer Series in Statistics. Springer, New York.
  • Bertin, E. and Arnouts, S. (1996). SExtractor: Software for source extraction. Astron. Astrophys. Suppl. Ser. 117 393–404.
  • Bezanson, J., Edelman, A., Karpinski, S. and Shah, V. B. (2017). Julia: A fresh approach to numerical computing. SIAM Rev. 59 65–98.
  • Bickel, P. J. and Doksum, K. A. (2016). Mathematical Statistics—Basic Ideas and Selected Topics. Vol. 2, 2nd ed. Texts in Statistical Science Series. CRC Press, Boca Raton, FL.
  • Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York.
  • Blei, D. M., Kucukelbir, A. and McAuliffe, J. D. (2017). Variational inference: A review for statisticians. J. Amer. Statist. Assoc. 112 859–877.
  • Booth, J. G. and Sarkar, S. (1998). Monte Carlo approximation of bootstrap variances. Amer. Statist. 52 354–357.
  • Bosch, J., Armstrong, R., Bickerton, S. et al. (2018). The Hyper Suprime-Cam software pipeline. Publ. Astron. Soc. Jpn. 70.
  • Brewer, B. J., Foreman-Mackey, D. and Hogg, D. W. (2013). Probabilistic catalogs for crowded stellar fields. Astron. J. 146.
  • Bubeck, S. (2015). Convex optimization: Algorithms and complexity. Found. Trends Mach. Learn. 8.
  • Doi, M., Tanaka, M. et al. (2010). Photometric response functions of the Sloan Digital Sky Survey imager. Astron. J. 139.
  • Eisenstein, D. J., Annis, J., Gunn, J. E. et al. (2001). Spectroscopic target selection for the Sloan Digital Sky Survey: The Luminous red galaxy sample. Astron. J. 122 2267.
  • Fan, Y. and Sisson, S. A. (2011). Reversible Jump Markov Chain Monte Carlo. In Handbook of Markov Chain Monte Carlo.
  • Feigelson, E. D. and Babu, G. J. (2012). Modern Statistical Methods for Astronomy: With R Applications. Cambridge Univ. Press, Cambridge.
  • Feldman, M. (2018). Summit up and running at Oak Ridge. Available at https://www.top500.org/news/summit-up-and-running-at-oak-ridge-claims-first-exascale-application/. [Online; accessed August 20, 2018].
  • Folk, M., Heber, G., Koziol, Q., Pourmal, E. and Robinson, D. (2011). An overview of the HDF5 technology suite and its applications. In Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases 36–47.
  • Fraysse, A. and Rodet, T. (2014). A measure-theoretic variational Bayesian algorithm for large dimensional problems. SIAM J. Imaging Sci. 7 2591–2622.
  • Friel, N. and Wyse, J. (2012). Estimating the evidence—a review. Stat. Neerl. 66 288–308.
  • Fukugita, M., Ichikawa, T., Gunn, J. E., Doi, M., Shimasaku, K. and Schneider, D. P. (1996). The Sloan Digital Sky Survey photometric system. Astron. J. 111.
  • Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statist. Sci. 7 457–472.
  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. and Rubin, D. B. (2014). Bayesian Data Analysis, 3rd ed. Texts in Statistical Science Series. CRC Press, Boca Raton, FL.
  • Giordano, R. J., Broderick, T. and Jordan, M. I. (2015). Linear response methods for accurate covariance estimates from mean field variational Bayes. In Advances in Neural Information Processing Systems 1441–1449.
  • Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711–732.
  • Hager, G. and Wellein, G. (2010). Introduction to High Performance Computing for Scientists and Engineers. CRC Press, Boca Raton, FL.
  • Hershey, J. R. and Olsen, P. A. (2007). Approximating the Kullback Leibler divergence between Gaussian mixture models. In IEEE International Conference on Acoustics, Speech and Signal Processing 4 317–320.
  • Julia developers (2018). Julia micro-benchmarks. Available at https://julialang.org/benchmarks/. [Online; accessed January 23, 2018].
  • Lang, D., Hogg, D. W. and Mykytyn, D. (2016). The Tractor: Probabilistic astronomical source detection and measurement. Astrophysics Source Code Library.
  • LSST (2017). About LSST. Available at http://www.lsst.org/about. [Online; accessed September 12, 2017].
  • Lupton, R. H., Ivezic, Z. et al. (2005). SDSS image processing II: The photo pipelines. Technical Report, Princeton Univ. Preprint. Available at https://www.astro.princeton.edu/~rhl/photo-lite.pdf.
  • Lupton, R., Gunn, J. E., Ivezić, Z., Knapp, G. R. and Kent, S. (2001). The SDSS imaging pipelines. In Astronomical Data Analysis Software and Systems X (F. R. Harnden, Jr., F. A. Primini and H. E. Payne, eds.). Astronomical Society of the Pacific Conference Series 238 269–280.
  • MacKay, D. J. (1995). Developments in probabilistic modelling with neural networks—ensemble learning. In Neural Networks: Artificial Intelligence and Industrial Applications. Springer, Berlin.
  • Melchior, P., Moolekamp, F., Jerdee, M., Armstrong, R., Sun, A.-L., Bosch, J. and Lupton, R. (2018). SCARLET: Source separation in multi-band images by constrained matrix factorization. Available at arXiv:1802.10157.
  • Morgan, T. (2018). The end of Xeon Phi. Available at https://www.nextplatform.com/2018/07/27/end-of-the-line-for-xeon-phi-its-all-xeon-from-here/. [Online; accessed August 20, 2018].
  • Mujtaba, H. (2018). Intel Xeon Scalable Family Roadmap Revealed. Available at https://wccftech.com/intel-xeon-scalable-family-roadmap-revealed-points-out-cascade-lake-sp-in-q4-2018-cooper-lake-sp-in-q4-2019-ice-lake-sp-in-1h-2020/. [Online; accessed August 20, 2018].
  • Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge, MA.
  • Neal, R. M. (2001). Annealed importance sampling. Stat. Comput. 11 125–139.
  • Neal, R. M. (2003). Slice sampling. Ann. Statist. 31 705–767.
  • NERSC (2018). Cori configuration. Available at http://www.nersc.gov/users/computational-systems/cori/configuration/. [Online; accessed January 23, 2018].
  • Nocedal, J. and Wright, S. J. (2006). Numerical Optimization, 2nd ed. Springer Series in Operations Research and Financial Engineering. Springer, New York.
  • Portillo, S. K. N., Lee, B. C. G., Daylan, T. and Finkbeiner, D. P. (2017). Improved point-source detection in crowded fields using probabilistic cataloging. Astron. J. 154.
  • Price, D. C., Barsdell, B. R. and Greenhill, L. J. (2015). HDFITS: Porting the FITS data model to HDF5. Astron. Comput. 12.
  • Regier, J., McAuliffe, J. and Prabhat (2015). A deep generative model for astronomical images of galaxies. In NIPS Workshop: Advances in Approximate Bayesian Inference.
  • Regier, J., Miller, A. C., Schlegel, D., Adams, R. P., McAuliffe, J. D. and Prabhat (2019). Supplement to “Approximate inference for constructing astronomical catalogs from images.” DOI:10.1214/19-AOAS1258SUPP.
  • Richards, G. T., Fan, X., Newberg, H. J., Strauss, M. A. et al. (2002). Spectroscopic target selection in the Sloan Digital Sky Survey: The quasar sample. Astron. J. 123 2945.
  • Robert, C. P. and Casella, G. (1999). Monte Carlo Statistical Methods. Springer Texts in Statistics. Springer, New York.
  • Rowe, B., Jarvis, M., Mandelbaum, R. et al. (2015). GALSIM: The modular galaxy image simulation toolkit. Astron. Comput. 10.
  • Schmidt, M. N., Winther, O. and Hansen, L. K. (2009). Bayesian non-negative matrix factorization. In International Conference on Independent Component Analysis and Signal Separation 540–547.
  • SDSS (2017). Measures of flux and magnitude. Available at http://www.sdss3.org/dr8/algorithms/magnitudes.php. [Online; accessed November 12, 2017].
  • SDSS (2018a). Glossary of SDSS-IV terminology. Available at https://www.sdss.org/dr14/help/glossary/#N. [Online; accessed August 21, 2018].
  • SDSS (2018b). Camera. Available at http://www.sdss.org/instruments/camera/. [Online; accessed January 30, 2018].
  • SDSS (2018c). Sky coverage. Available at http://classic.sdss.org/dr7/coverage/. [Online; accessed January 30, 2018].
  • SDSS (2018d). Data model: Frame. https://data.sdss.org/datamodel/files/BOSS_PHOTOOBJ/frames/RERUN/RUN/CAMCOL/frame.html. [Online; accessed August 21, 2018].
  • Skilling, J. (2004). Nested sampling. In Bayesian Inference and Maximum Entropy Methods in Science and Engineering. AIP Conf. Proc. 735 395–405. Amer. Inst. Phys., Melville, NY.
  • Šmídl, V. and Quinn, A. (2008). Variational Bayesian filtering. IEEE Trans. Signal Process. 56 5020–5030.
  • Smith, R. (2018). AMD reaffirms 7nm Epyc Rome server processors sampling in 2H 2018. Available at https://www.anandtech.com/show/13122/amd-rome-epyc-cpus-to-be-fabbed-by-tsmc. [Online; accessed August 20, 2018].
  • Strauss, M. A., Weinberg, D. H., Lupton, R. H. et al. (2002). Spectroscopic target selection in the Sloan Digital Sky Survey: The main galaxy sample. Astron. J. 124 1810.
  • Top500.org (2017). Top500 List—November 2017. Available at https://www.top500.org/list/2017/11/. [Online; accessed November 16, 2017].
  • Turon, C., Meynadier, F., Arenou, F., Hogg, D. and Lang, D. (2010). Telescopes Don’t Make Catalogues! European Astronomical Society Publications Series 45.
  • van Leeuwen, D. (2018). GaussianMixtures.jl. Available at https://github.com/davidavdav/GaussianMixtures.jl. [Online; accessed August 21, 2018].
  • Wells, D. C. and Greisen, E. W. (1979). FITS—a flexible image transport system. In Image Processing in Astronomy 445–471.
  • Wilson, H. (2009). M33—“The Galaxy in Triangulum.” Available at https://hwilson.zenfolio.com/galaxies/h274E3B43#h274e3b43.
  • York, D. G., Adelman, J., Anderson, J., Anderson, S. F., Annis, J., Bahcall, N. A., Bakken, J. A. et al. (2000). The Sloan Digital Sky Survey: Technical summary. Astron. J. 120 1579–1587.
  • Zheng, Y., Fraysse, A. and Rodet, T. (2015). Efficient variational Bayesian approximation method based on subspace optimization. IEEE Trans. Image Process. 24 681–693.

Supplemental materials