Statistical Science

Importance Sampling: Intrinsic Dimension and Computational Cost

S. Agapiou, O. Papaspiliopoulos, D. Sanz-Alonso, and A. M. Stuart

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

The basic idea of importance sampling is to use independent samples from a proposal measure in order to approximate expectations with respect to a target measure. It is key to understand how many samples are required in order to guarantee accurate approximations. Intuitively, some notion of distance between the target and the proposal should determine the computational cost of the method. A major challenge is to quantify this distance in terms of parameters or statistics that are pertinent for the practitioner. The subject has attracted substantial interest from within a variety of communities. The objective of this paper is to overview and unify the resulting literature by creating an overarching framework. A general theory is presented, with a focus on the use of importance sampling in Bayesian inverse problems and filtering.

Article information

Source
Statist. Sci. Volume 32, Number 3 (2017), 405-431.

Dates
First available in Project Euclid: 1 September 2017

Permanent link to this document
https://projecteuclid.org/euclid.ss/1504253124

Digital Object Identifier
doi:10.1214/17-STS611

Keywords
Importance sampling notions of dimension small noise absolute continuity inverse problems filtering

Citation

Agapiou, S.; Papaspiliopoulos, O.; Sanz-Alonso, D.; Stuart, A. M. Importance Sampling: Intrinsic Dimension and Computational Cost. Statist. Sci. 32 (2017), no. 3, 405--431. doi:10.1214/17-STS611. https://projecteuclid.org/euclid.ss/1504253124


Export citation

References

  • [1] Achutegui, K., Crisan, D., Miguez, J. and Rios, G. (2014). A simple scheme for the parallelization of particle filters and its application to the tracking of complex stochastic systems. Preprint. Available at arXiv:1407.8071.
  • [2] Agapiou, S., Larsson, S. and Stuart, A. M. (2013). Posterior contraction rates for the Bayesian approach to linear ill-posed inverse problems. Stochastic Process. Appl. 123 3828–3860.
  • [3] Agapiou, S. and Mathé, P. (2014). Preconditioning the prior to overcome saturation in Bayesian inverse problems. Preprint. Available at arXiv:1409.6496.
  • [4] Agapiou, S., Papaspiliopoulos, O., Sanz-Alonso, D. and Stuart, A. M. (2017). Supplement to “Importance sampling: Intrinsic dimension and computational cost.” DOI:10.1214/17-STS611SUPP.
  • [5] Agapiou, S., Stuart, A. M. and Zhang, Y.-X. (2014). Bayesian posterior contraction rates for linear severely ill-posed inverse problems. J. Inverse Ill-Posed Probl. 22 297–321.
  • [6] Bain, A. and Crisan, D. (2009). Fundamentals of Stochastic Filtering 3. Springer, Berlin.
  • [7] Bender, C. M. and Orszag, S. A. (1999). Advanced Mathematical Methods for Scientists and Engineers. I. Asymptotic Methods and Perturbation Theory. Springer, New York.
  • [8] Bengtsson, T., Bickel, P., Li, B. et al. (2008). Curse-of-dimensionality revisited: Collapse of the particle filter in very large scale systems. In Probability and Statistics: Essays in Honor of David A. Freedman. Inst. Math. Stat. (IMS) Collect. 316–334. IMS, Beachwood, OH.
  • [9] Beskos, A., Crisan, D. and Jasra, A. (2014). On the stability of sequential Monte Carlo methods in high dimensions. Ann. Appl. Probab. 24 1396–1445.
  • [10] Beskos, A., Crisan, D., Jasra, A., Kamatani, K. and Zhou, Y. (2014). A stable particle filter in high-dimensions. Preprint. Available at 1412.3501.
  • [11] Bickel, P., Li, B. and Bengtsson, T. (2008). Sharp failure rates for the bootstrap particle filter in high dimensions. In Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh. Inst. Math. Stat. (IMS) Collect. 3 318–329. IMS, Beachwood, OH.
  • [12] Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer, New York.
  • [13] Boucheron, S., Lugosi, G. and Massart, P. (2013). Concentration Inequalities. Oxford Univ. Press, Oxford.
  • [14] Bui-Thanh, T. and Ghattas, O. (2012). Analysis of the Hessian for inverse scattering problems: I. Inverse shape scattering of acoustic waves. Inverse Problems 28 055001, 32.
  • [15] Bui-Thanh, T., Ghattas, O., Martin, J. and Stadler, G. (2013). A computational framework for infinite-dimensional Bayesian inverse problems part I: The linearized case, with application to global seismic inversion. SIAM J. Sci. Comput. 35 A2494–A2523.
  • [16] Caflisch, R. E., Morokoff, W. J. and Owen, A. B. (1997). Valuation of Mortgage Backed Securities Using Brownian Bridges to Reduce Effective Dimension. Dept. Mathematics, Univ. California, Los Angeles.
  • [17] Caponnetto, A. and De Vito, E. (2007). Optimal rates for the regularized least-squares algorithm. Found. Comput. Math. 7 331–368.
  • [18] Cavalier, L. and Tsybakov, A. (2002). Sharp adaptation for inverse problems with random noise. Probab. Theory Related Fields 123 323–354.
  • [19] Chatterjee, S. and Diaconis, P. (2015). The sample size required in importance sampling. Preprint. Available at arXiv:1511.01437.
  • [20] Chen, Y. (2005). Another look at rejection sampling through importance sampling. Statist. Probab. Lett. 72 277–283.
  • [21] Chopin, N. (2004). Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference. Ann. Statist. 32 2385–2411.
  • [22] Chopin, N. and Papaspiliopoulos, O. (2016). A Concise Introduction to Sequential Monte Carlo.
  • [23] Chorin, A. J. and Morzfeld, M. (2013). Conditions for successful data assimilation. Journal of Geophysical Research: Atmospheres 118 11–522.
  • [24] Constantine, P. G. (2015). Active Subspaces: Emerging Ideas for Dimension Reduction in Parameter Studies 2. SIAM Spotlights, Philadelphia, PA.
  • [25] Cotter, S. L., Roberts, G. O., Stuart, A. M. and White, D. (2013). MCMC methods for functions: Modifying old algorithms to make them faster. Statist. Sci. 28 424–446.
  • [26] Crisan, D., Del Moral, P. and Lyons, T. (1999). Discrete filtering using branching and interacting particle systems. Markov Process. Related Fields 5 293–318.
  • [27] Crisan, D. and Doucet, A. (2002). A survey of convergence results on particle filtering methods for practitioners. IEEE Trans. Signal Process. 50 736–746.
  • [28] Crisan, D. and Rozovskiĭ, B., eds. (2011). The Oxford Handbook of Nonlinear Filtering. Oxford Univ. Press, Oxford.
  • [29] Cui, T., Martin, J., Marzouk, Y. M., Solonen, A. and Spantini, A. (2014). Likelihood-informed dimension reduction for nonlinear inverse problems. Inverse Problems 30 114015, 28.
  • [30] Dashti, M. and Stuart, A. M. (2016). The Bayesian approach to inverse problems. In Handbook of Uncertainty Quantification (R. Ghanem, D. Higdon and H. Owhadi, eds.). http://arxiv.org/abs/1302.6989.
  • [31] Del Moral, P. (2004). Feynman–Kac Formulae. Springer, New York.
  • [32] Del Moral, P. and Miclo, L. (2000). Branching and interacting particle systems approximations of Feynman–Kac formulae with applications to non-linear filtering. In Séminaire de Probabilités, XXXIV. Lecture Notes in Math. 1729 1–145. Springer, Berlin.
  • [33] Doucet, A., de Freitas, N. and Gordon, N. (2001). An introduction to sequential Monte Carlo methods. In Sequential Monte Carlo Methods in Practice. 3–14. Springer, New York.
  • [34] Doucet, A., Godsill, S. and Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Stat. Comput. 10 197–208.
  • [35] Doukhan, P. and Lang, G. (2009). Evaluation for moments of a ratio with application to regression estimation. Bernoulli 15 1259–1286.
  • [36] Downey, P. J. and Wright, P. E. (2007). The ratio of the extreme to the sum in a random sequence. Extremes 10 249–266.
  • [37] Dupuis, P., Spiliopoulos, K. and Wang, H. (2012). Importance sampling for multiscale diffusions. Multiscale Model. Simul. 10 1–27.
  • [38] Engl, H. W., Hanke, M. and Neubauer, A. (1996). Regularization of Inverse Problems. Mathematics and Its Applications 375. Kluwer Academic, Dordrecht.
  • [39] Franklin, J. N. (1970). Well-posed stochastic extensions of ill-posed linear problems. J. Math. Anal. Appl. 31 682–716.
  • [40] Frei, M. and Künsch, H. R. (2013). Bridging the ensemble Kalman and particle filters. Biometrika 100 781–800.
  • [41] Furrer, R. and Bengtsson, T. (2007). Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J. Multivariate Anal. 98 227–255.
  • [42] Gelman, A., Roberts, G. O. and Gilks, W. R. (1996). Efficient Metropolis jumping rules. In Bayesian Statistics, 5 (Alicante, 1994). 599–607. Oxford Univ. Press, New York.
  • [43] Gibbs, A. L. and Su, F. E. (2002). On choosing and bounding probability metrics. Int. Stat. Rev. 70 419–435.
  • [44] Goodman, J., Lin, K. K. and Morzfeld, M. (2015). Small-noise analysis and symmetrization of implicit Monte Carlo samplers. Comm. Pure Appl. Math.
  • [45] Han, W. (2013). On the Numerical Solution of the Filtering Problem Ph.D. thesis, Dept. Mathematics, Imperial College, London.
  • [46] Johansen, A. M. and Doucet, A. (2008). A note on auxiliary particle filters. Statist. Probab. Lett. 78 1498–1504.
  • [47] Kahn, H. (1955). Use of Different Monte Carlo Sampling Techniques. Rand Corporation, Santa Monica, CA.
  • [48] Kahn, H. and Marshall, A. W. (1953). Methods of reducing sample size in Monte Carlo computations. Journal of the Operations Research Society of America 1 263–278.
  • [49] Kaipio, J. and Somersalo, E. (2005). Statistical and Computational Inverse Problems. Applied Mathematical Sciences 160. Springer, New York.
  • [50] Kallenberg, O. (2002). Foundations of Modern Probability. Probability and Its Applications, 2nd ed. Springer, Berlin.
  • [51] Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. J. Fluids Eng. 82 35–45.
  • [52] Kalnay, E. (2003). Atmospheric Modeling, Data Assimilation, and Predictability. Cambridge Univ. Press, Cambridge.
  • [53] Kekkonen, H., Lassas, M. and Siltanen, S. (2016). Posterior consistency and convergence rates for Bayesian inversion with hypoelliptic operators. Inverse Probl. 32 085005, 31.
  • [54] Knapik, B. T., van Der Vaart, A. W. and van Zanten, J. H. (2011). Bayesian inverse problems with Gaussian priors. Ann. Statist. 39 2626–2657.
  • [55] Knapik, B. T., van der Vaart, A. W. and van Zanten, J. H. (2013). Bayesian recovery of the initial condition for the heat equation. Comm. Statist. Theory Methods 42 1294–1313.
  • [56] Kong, A. (1992). A note on importance sampling using standardized weights. Technical Report 348, Dept. Statistics, Univ. Chicago, Chicago, IL.
  • [57] Kong, A., Liu, J. S. and Wong, W. H. (1994). Sequential imputations and Bayesian missing data problems. J. Amer. Statist. Assoc. 89 278–288.
  • [58] Kuo, F. Y. and Sloan, I. H. (2005). Lifting the curse of dimensionality. Notices Amer. Math. Soc. 52 1320–1329.
  • [59] Lancaster, P. and Rodman, L. (1995). Algebraic Riccati Equations. Oxford Univ. Press, London.
  • [60] Lasanen, S. (2007). Measurements and infinite-dimensional statistical inverse theory. PAMM 7 1080101–1080102.
  • [61] Lasanen, S. (2012). Non-Gaussian statistical inverse problems. Part I: Posterior distributions. Inverse Probl. Imaging 6 215–266.
  • [62] Lasanen, S. (2012). Non-Gaussian statistical inverse problems. Part II: Posterior convergence for approximated unknowns. Inverse Probl. Imaging 6 267–287.
  • [63] Lax, P. D. (2002). Functional Analysis. Wiley, New York.
  • [64] Lehtinen, M. S., Paivarinta, L. and Somersalo, E. (1989). Linear inverse problems for generalised random variables. Inverse Probl. 5 599.
  • [65] Le Gland, F., Monbet, V. and Tran, V. (2009). Large sample asymptotics for the ensemble Kalman filter Ph.D. thesis, INRIA.
  • [66] Li, W., Tan, Z. and Chen, R. (2013). Two-stage importance sampling with mixture proposals. J. Amer. Statist. Assoc. 108 1350–1365.
  • [67] Liang, F., Liu, C. and Carroll, R. J. (2007). Stochastic approximation in Monte Carlo computation. J. Amer. Statist. Assoc. 102 305–320.
  • [68] Lin, K., Lu, S. and Mathé, P. (2015). Oracle-type posterior contraction rates in Bayesian inverse problems. Inverse Probl. Imaging 9.
  • [69] Lindley, D. V. and Smith, A. F. M. (1972). Bayes estimates for the linear model. J. R. Stat. Soc. Ser. B. Stat. Methodol. 34 1–41.
  • [70] Liu, J. S. (1996). Metropolized independent sampling with comparisons to rejection sampling and importance sampling. Stat. Comput. 6 113–119.
  • [71] Liu, J. S. (2008). Monte Carlo Strategies in Scientific Computing. Springer, New York.
  • [72] Liu, J. S. and Chen, R. (1998). Sequential Monte Carlo methods for dynamic systems. J. Amer. Statist. Assoc. 93 1032–1044.
  • [73] Lu, S. and Mathé, P. (2014). Discrepancy based model selection in statistical inverse problems. J. Complexity 30 290–308.
  • [74] Mandelbaum, A. (1984). Linear estimators and measurable linear transformations on a Hilbert space. Z. Wahrsch. Verw. Gebiete 65 385–397.
  • [75] Martin, J., Wilcox, L. C., Burstedde, C. and Ghattas, O. (2012). A stochastic Newton MCMC method for large-scale statistical inverse problems with application to seismic inversion. SIAM J. Sci. Comput. 34 A1460–A1487.
  • [76] McLeish, D. L. and O’Brien, G. L. (1982). The expected ratio of the sum of squares to the square of the sum. Ann. Probab. 10 1019–1028.
  • [77] Míguez, J., Crisan, D. and Djurić, P. M. (2013). On the convergence of two sequential Monte Carlo methods for maximum a posteriori sequence estimation and stochastic global optimization. Stat. Comput. 23 91–107.
  • [78] Morzfeld, M., Tu, X., Wilkening, J. and Chorin, A. J. (2015). Parameter estimation by implicit sampling. Commun. Appl. Math. Comput. Sci. 10 205–225.
  • [79] Moskowitz, B. and Caflisch, R. E. (1996). Smoothness and dimension reduction in quasi-Monte Carlo methods. Math. Comput. Modelling 23 37–54.
  • [80] Mueller, J. L. and Siltanen, S. (2012). Linear and Nonlinear Inverse Problems with Practical Applications. Computational Science & Engineering 10. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA.
  • [81] Oliver, D. S., Reynolds, A. C. and Liu, N. (2008). Inverse Theory for Petroleum Reservoir Characterization and History Matching. Cambridge Univ. Press, Cambridge.
  • [82] Owen, A. and Zhou, Y. (2000). Safe and effective importance sampling. J. Amer. Statist. Assoc. 95 135–143.
  • [83] Pitt, M. K. and Shephard, N. (1999). Filtering via simulation: Auxiliary particle filters. J. Amer. Statist. Assoc. 94 590–599.
  • [84] Ray, K. (2013). Bayesian inverse problems with non-conjugate priors. Electron. J. Stat. 7 2516–2549.
  • [85] Rebeschini, P. and van Handel, R. (2015). Can local particle filters beat the curse of dimensionality? Ann. Appl. Probab. 25 2809–2866.
  • [86] Ren, Y.-F. and Liang, H.-Y. (2001). On the best constant in Marcinkiewicz–Zygmund inequality. Statist. Probab. Lett. 53 227–233.
  • [87] Sanz-Alonso, D. (2016). Importance sampling and necessary sample size: An information theory approach. Preprint. Available at arXiv:1608.08814.
  • [88] Slivinski, L. and Snyder, C. Practical estimates of the ensemble size necessary for particle filters.
  • [89] Snyder, C. (2011). Particle filters, the “optimal” proposal and high-dimensional systems. In Proceedings of the ECMWF Seminar on Data Assimilation for Atmosphere and Ocean.
  • [90] Snyder, C., Bengtsson, T., Bickel, P. and Anderson, J. (2008). Obstacles to high-dimensional particle filtering. Mon. Weather Rev. 136 4629–4640.
  • [91] Snyder, C., Bengtsson, T. and Morzfeld, M. (2015). Performance bounds for particle filters using the optimal proposal. Mon. Weather Rev. 143 4750–4761.
  • [92] Spantini, A., Solonen, A., Cui, T., Martin, J., Tenorio, L. and Marzouk, Y. (2015). Optimal low-rank approximations of Bayesian linear inverse problems. SIAM J. Sci. Comput. 37 A2451–A2487.
  • [93] Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B. Stat. Methodol. 64 583–639.
  • [94] Spiliopoulos, K. (2013). Large deviations and importance sampling for systems of slow-fast motion. Appl. Math. Optim. 67 123–161.
  • [95] Stuart, A. M. (2010). Inverse problems: A Bayesian perspective. Acta Numer. 19 451–559.
  • [96] Tan, Z. (2004). On a likelihood approach for Monte Carlo integration. J. Amer. Statist. Assoc. 99 1027–1036.
  • [97] Tierney, L. (1998). A note on Metropolis–Hastings kernels for general state spaces. Ann. Appl. Probab. 8 1–9.
  • [98] van Leeuwen, P. J. (2010). Nonlinear data assimilation in geosciences: An extremely efficient particle filter. Q. J. R. Meteorol. Soc. 136 1991–1999.
  • [99] Vanden-Eijnden, E. and Weare, J. (2012). Data assimilation in the low noise, accurate observation regime with application to the Kuroshio current. Mon. Weather Rev. 141.
  • [100] Vollmer, S. J. (2013). Posterior consistency for Bayesian inverse problems through stability and regression results. Inverse Probl. 29 125011.
  • [101] Whiteley, N., Lee, A. and Heine, K. (2016). On the role of interaction in sequential Monte Carlo algorithms. Bernoulli 22 494–529.
  • [102] Zhang, T. (2002). Effective dimension and generalization of kernel learning. In Advances in Neural Information Processing Systems 454–461.
  • [103] Zhang, T. (2005). Learning bounds for kernel regression using effective data dimensionality. Neural Comput. 17 2077–2098.
  • [104] Zhang, W., Hartmann, C., Weber, M. and Schütte, C. (2013). Importance sampling in path space for diffusion processes. Multiscale Model. Simul.

Supplemental materials

  • Supplement to “Importance sampling: Intrinsic dimension and computational cost”. The Supplementary Material contains the proofs of all our results. It also contains some background on Gaussian measures on Hilbert spaces, and related technical aspects arising from considering measures in infinite dimensional spaces.