## The Annals of Statistics

### Support points

#### Abstract

This paper introduces a new way to compact a continuous probability distribution $F$ into a set of representative points called support points. These points are obtained by minimizing the energy distance, a statistical potential measure initially proposed by Székely and Rizzo [InterStat 5 (2004) 1–6] for testing goodness-of-fit. The energy distance has two appealing features. First, its distance-based structure allows us to exploit the duality between powers of the Euclidean distance and its Fourier transform for theoretical analysis. Using this duality, we show that support points converge in distribution to $F$, and enjoy an improved error rate to Monte Carlo for integrating a large class of functions. Second, the minimization of the energy distance can be formulated as a difference-of-convex program, which we manipulate using two algorithms to efficiently generate representative point sets. In simulation studies, support points provide improved integration performance to both Monte Carlo and a specific quasi-Monte Carlo method. Two important applications of support points are then highlighted: (a) as a way to quantify the propagation of uncertainty in expensive simulations and (b) as a method to optimally compact Markov chain Monte Carlo (MCMC) samples in Bayesian computation.

#### Article information

Source
Ann. Statist., Volume 46, Number 6A (2018), 2562-2592.

Dates
Revised: August 2017
First available in Project Euclid: 7 September 2018

https://projecteuclid.org/euclid.aos/1536307226

Digital Object Identifier
doi:10.1214/17-AOS1629

Mathematical Reviews number (MathSciNet)
MR3851748

Zentralblatt MATH identifier
06968592

Subjects
Primary: 62E17: Approximations to distributions (nonasymptotic)

#### Citation

Mak, Simon; Joseph, V. Roshan. Support points. Ann. Statist. 46 (2018), no. 6A, 2562--2592. doi:10.1214/17-AOS1629. https://projecteuclid.org/euclid.aos/1536307226

#### References

• [1] Ascher, U. M. and Greif, C. (2011). A First Course in Numerical Methods. Computational Science & Engineering 7. SIAM, Philadelphia, PA.
• [2] Bahouri, H., Chemin, J.-Y. and Danchin, R. (2011). Fourier Analysis and Nonlinear Partial Differential Equations 343. Springer, Heidelberg.
• [3] Borodachov, S. V., Hardin, D. P. and Saff, E. B. (2014). Low complexity methods for discretizing manifolds via Riesz energy minimization. Found. Comput. Math. 14 1173–1208.
• [4] Bousquet, O. and Bottou, L. (2008). The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems 161–168.
• [5] Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M. A., Guo, J., Li, P. and Riddell, A. (2017). Stan: A probabilistic programming language. J. Stat. Softw. 76 1–32.
• [6] Cox, D. R. (1957). Note on grouping. J. Amer. Statist. Assoc. 52 543–547.
• [7] Dalenius, T. (1950). The problem of optimum stratification. Scand. Actuar. J. 1950 203–213.
• [8] Dick, J., Kuo, F. Y. and Sloan, I. H. (2013). High-dimensional integration: The quasi-Monte Carlo way. Acta Numer. 22 133–288.
• [9] Dick, J. and Pillichshammer, F. (2010). Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte Carlo Integration. Cambridge Univ. Press, Cambridge.
• [10] Di Nezza, E., Palatucci, G. and Valdinoci, E. (2012). Hitchhiker’s guide to the fractional Sobolev spaces. Bulletin des Sciences Mathématiques 136 521–573.
• [11] Draper, N. R. and Smith, H. (1981). Applied Regression Analysis, 2nd ed. Wiley, New York.
• [12] Dutang, C. and Savicky, P. (2013). randtoolbox: Generating and testing random numbers. R package.
• [13] Fang, K.-T. (1980). The uniform design: Application of number-theoretic methods in experimental design. Acta Math. Appl. Sin. 3 363–372.
• [14] Fang, K.-T., Lu, X., Tang, Y. and Yin, J. (2004). Constructions of uniform designs by using resolvable packings and coverings. Discrete Math. 274 25–40.
• [15] Fang, K.-T. and Wang, Y. (1994). Number-Theoretic Methods in Statistics. Monographs on Statistics and Applied Probability 51. Chapman & Hall, London.
• [16] Flury, B. A. (1990). Principal points. Biometrika 77 33–41.
• [17] Gelfand, I. and Shilov, G. (1964). Generalized Functions, Vol. I: Properties and Operations. Academic Press, New York.
• [18] Genz, A. (1984). Testing multidimensional integration routines. In Proc. of International Conference on Tools, Methods and Languages for Scientific and Engineering Computation 81–94. Elsevier, North-Holland.
• [19] Geyer, C. J. (1992). Practical Markov chain Monte Carlo. Statist. Sci. 7 473–483.
• [20] Ghadimi, S. and Lan, G. (2013). Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23 2341–2368.
• [21] Girolami, M. and Calderhead, B. (2011). Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 123–214.
• [22] Graf, S. and Luschgy, H. (2000). Foundations of Quantization for Probability Distributions. Springer, Berlin.
• [23] Hickernell, F. J. (1998). A generalized discrepancy and quadrature error bound. Math. Comp. 67 299–322.
• [24] Hickernell, F. J. (1999). Goodness-of-fit statistics, discrepancies and robust designs. Statist. Probab. Lett. 44 73–78.
• [25] Hunter, J. K. and Nachtergaele, B. (2001). Applied Analysis. World Scientific, Singapore.
• [26] Joe, S. and Kuo, F. Y. (2003). Remark on algorithm 659: Implementing Sobol’s quasirandom sequence generator. ACM Trans. Math. Software 29 49–57.
• [27] Joseph, V. R., Dasgupta, T., Tuo, R. and Wu, C. F. J. (2015). Sequential exploration of complex surfaces using minimum energy designs. Technometrics 57 64–74.
• [28] Joseph, V. R., Gul, E. and Ba, S. (2015). Maximum projection designs for computer experiments. Biometrika 102 371–380.
• [29] Kiefer, J. (1961). On large deviations of the empiric df of vector chance variables and a law of the iterated logarithm. Pacific J. Math. 11 649–660.
• [30] Kolmogorov, A. (1933). Sulla determinazione empirica delle leggi di probabilita. G. Ist. Ital. Attuari 4 1–11.
• [31] Korolyuk, V. and Borovskikh, Y. V. (1989). Convergence rate for degenerate von Mises functionals. Theory Probab. Appl. 33 125–135.
• [32] Kuo, F. Y. and Sloan, I. H. (2005). Lifting the curse of dimensionality. Notices Amer. Math. Soc. 52 1320–1328.
• [33] Lange, K. (2016). MM Optimization Algorithms. SIAM, Philadelphia.
• [34] Link, W. A. and Eaton, M. J. (2012). On thinning of chains in MCMC. Methods Ecol. Evol. 3 112–115.
• [35] Lipp, T. and Boyd, S. (2016). Variations and extension of the convex-concave procedure. Optim. Eng. 17 263–287.
• [36] Lloyd, S. (1982). Least squares quantization in PCM. IEEE Trans. Inform. Theory 28 129–137.
• [37] Mairal, J. (2013). Stochastic majorization-minimization algorithms for large-scale optimization. In Advances in Neural Information Processing Systems 2283–2291.
• [38] Mak, S. (2017). support: Support Points. R package version 0.1.0.
• [39] Mak, S. and Joseph, V. R. (2018). Minimax and minimax projection designs using clustering. J. Comput. Graph. Statist. 27 166-178.
• [40] Mak, S. and Joseph, V. R (2018). Supplement to “Support points.” DOI:10.1214/17-AOS1629SUPP.
• [41] Mak, S., Sung, C.-L., Wang, X., Yeh, S.-T., Chang, Y.-H., Joseph, V. R., Yang, V. and Wu, C. F. J. (2017). An efficient surrogate model for emulation and physics extraction of large eddy simulations. Preprint. Available at arXiv:1611.07911.
• [42] Mak, S. and Xie, Y. (2017). Uncertainty quantification and design for noisy matrix completion-a unified framework. Preprint. Available at arXiv:1706.08037.
• [43] Matsumoto, M. and Nishimura, T. (1998). Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans. Model. Comput. Simul. 8 3–30.
• [44] Nichols, J. A. and Kuo, F. Y. (2014). Fast CBC construction of randomly shifted lattice rules achieving $\mathcal{O}(n^{-1+\delta})$ convergence for unbounded integrands over $\mathbb{R}^{s}$ in weighted spaces with POD weights. J. Complexity 30 444–468.
• [45] Niederreiter, H. (1992). Random Number Generation and Quasi-Monte Carlo Methods. CBMS-NSF Regional Conference Series in Applied Mathematics 63. SIAM, Philadelphia, PA.
• [46] Nuyens, D. and Cools, R. (2006). Fast algorithms for component-by-component construction of rank-1 lattice rules in shift-invariant reproducing kernel Hilbert spaces. Math. Comp. 75 903–920.
• [47] Ortega, J. M. and Rheinboldt, W. C. (2000). Iterative Solution of Nonlinear Equations in Several Variables. SIAM, Philadelphia.
• [48] Owen, A. B. (1998). Scrambling Sobol’ and Niederreiter–Xing points. J. Complexity 14 466–489.
• [49] Owen, A. B. and Tribble, S. D. (2005). A quasi-Monte Carlo Metropolis algorithm. Proc. Natl. Acad. Sci. USA 102 8844–8849.
• [50] Pagès, G., Pham, H. and Printems, J. (2004). Optimal quantization methods and applications to numerical problems in finance. In Handbook of Computational and Numerical Methods in Finance 253–297. Birkhäuser, Boston, MA.
• [51] Paley, R. and Zygmund, A. (1930). On some series of functions. In In Mathematical Proceedings of the Cambridge Philosophical Society 26 337–357. Cambridge Univ. Press, Cambridge.
• [52] R Core Team (2017). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
• [53] Resnick, S. I. (2014). A Probability Path. Birkhäuser/Springer, New York.
• [54] Rosenblatt, M. (1952). Remarks on a multivariate transformation. Ann. Math. Stat. 23 470–472.
• [55] Royden, H. L. and Fitzpatrick, P. (2010). Real Analysis. Macmillan, New York.
• [56] Santner, T. J., Williams, B. J. and Notz, W. I. (2013). The Design and Analysis of Computer Experiments. Springer, New York.
• [57] Serfling, R. J. (2009). Approximation Theorems of Mathematical Statistics. Wiley, New York.
• [58] Shorack, G. R. (2000). Probability for Statisticians. Springer, New York.
• [59] Sloan, I. H., Kuo, F. Y. and Joe, S. (2002). Constructing randomly shifted lattice rules in weighted Sobolev spaces. SIAM J. Numer. Anal. 40 1650–1665.
• [60] Sobol’, I. M. (1967). Distribution of points in a cube and approximate evaluation of integrals. Ž. Vyčisl. Mat. i Mat. Fiz. 7 784–802.
• [61] Su, Y. (2000). Asymptotically optimal representative points of bivariate random vectors. Statist. Sinica 10 559–576.
• [62] Székely, G. J. (2003). E-statistics: The energy of statistical samples Technical Report 03-05, Dept. Mathematics and Statistics, Bowling Green State Univ, Bowling Green, OH.
• [63] Székely, G. J. and Rizzo, M. L. (2004). Testing for equal distributions in high dimension. InterStat 5 1–6.
• [64] Székely, G. J. and Rizzo, M. L. (2013). Energy statistics: A class of statistics based on distances. J. Statist. Plann. Inference 143 1249–1272.
• [65] Tao, P. D. and An, L. T. H. (1997). Convex analysis approach to DC programming: Theory, algorithms and applications. Acta Math. Vietnam. 22 289–355.
• [66] Tuy, H. (1986). A general deterministic approach to global optimization via dc programming. In FERMAT Days 85: Mathematics for Optimization (J. B. Hiriart-Urruty, ed.). North-Holland Mathematics Studies 129 137–162. North-Holland, Amsterdam.
• [67] Tuy, H. (1995). DC optimization: Theory, methods and algorithms. In Handbook of Global Optimization 149–216. Springer, Berlin.
• [68] Wendland, H. (2005). Scattered Data Approximation. Cambridge Univ. Press, Cambridge.
• [69] Worley, B. A. (1987). Deterministic uncertainty analysis. Technical Report ORNL-6428, Oak Ridge National Laboratory, Oak Ridge, TN.
• [70] Yang, G. (2012). The energy goodness-of-fit test for univariate stable distributions. Ph.D. thesis, Bowling Green State Univ., Bowling Green, OH.
• [71] Yuille, A. L. and Rangarajan, A. (2003). The concave-convex procedure. Neural Comput. 15 915–936.
• [72] Zador, P. (1982). Asymptotic quantization error of continuous signals and the quantization dimension. IEEE Trans. Inform. Theory 28 139–149.

#### Supplemental materials

• Supplement A: Additional proofs and results. We provide in this supplement further details on technical results and simulation studies.