Electronic Journal of Statistics

Measuring distributional asymmetry with Wasserstein distance and Rademacher symmetrization

Adam B. Kashlak

Full-text: Open access

Abstract

We propose of an improved version of the ubiquitous symmetrization inequality making use of the Wasserstein distance between a measure and its reflection in order to quantify the asymmetry of the given measure. An empirical bound on this asymmetric correction term is derived through a bootstrap procedure and shown to give tighter results in practical settings than the original uncorrected inequality. Lastly, a wide range of applications are detailed including testing for data symmetry, constructing nonasymptotic high dimensional confidence sets, bounding the variance of an empirical process, and improving constants in Nemirovski style inequalities for Banach space valued random variables.

Article information

Source
Electron. J. Statist., Volume 12, Number 2 (2018), 2091-2113.

Dates
Received: July 2017
First available in Project Euclid: 13 July 2018

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1531468822

Digital Object Identifier
doi:10.1214/18-EJS1440

Keywords
Concentration inequality generalized bootstrap high dimensional confidence set type and cotype

Rights
Creative Commons Attribution 4.0 International License.

Citation

Kashlak, Adam B. Measuring distributional asymmetry with Wasserstein distance and Rademacher symmetrization. Electron. J. Statist. 12 (2018), no. 2, 2091--2113. doi:10.1214/18-EJS1440. https://projecteuclid.org/euclid.ejs/1531468822


Export citation

References

  • [1] Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin., Network Flows: Theory, Algorithms, and Applications. Prentice-Hall, Inc, 1993.
  • [2] Sylvain Arlot, Gilles Blanchard, and Etienne Roquain. Some nonasymptotic results on resampling in high dimension, i: confidence regions., The Annals of Statistics, 38(1):51–82, 2010.
  • [3] L Baringhaus and N Henze. Limit distributions for mardia’s measure of multivariate skewness., The Annals of Statistics, pages 1889–1902, 1992.
  • [4] Peter L Bartlett and Shahar Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results., The Journal of Machine Learning Research, 3:463–482, 2003.
  • [5] Peter J Bickel and David A Freedman. Some asymptotic theory for the bootstrap., The Annals of Statistics, pages 1196–1217, 1981.
  • [6] Stéphane Boucheron, Gábor Lugosi, and Pascal Massart., Concentration inequalities: A nonasymptotic theory of independence. Oxford University Press, 2013.
  • [7] Lutz Dümbgen, Sara A van de Geer, Mark C Veraar, and Jon A Wellner. Nemirovski’s inequalities revisited., American Mathematical Monthly, 117(2):138–160, 2010.
  • [8] Bradley Efron and Charles Stein. The jackknife estimate of variance., The Annals of Statistics, pages 586–596, 1981.
  • [9] Zhou Fan. Confidence regions for infinite-dimensional statistical parameters., Part III essay in Mathematics, University of Cambridge, 2011. http://web.stanford.edu/~zhoufan/PartIIIEssay.pdf.
  • [10] Nicolas Fournier and Arnaud Guillin. On the rate of convergence in Wasserstein distance of the empirical measure., Probability Theory and Related Fields, 162(3–4):707–738, 2015.
  • [11] Evarist Giné and Richard Nickl. Adaptive estimation of a distribution function and its density in sup-norm loss by wavelet and spline projections., Bernoulli, 16(4) :1137–1163, 2010.
  • [12] Evarist Giné and Richard Nickl., Mathematical Foundations of Infinite-Dimensional Statistical Models. Cambridge University Press, 2016.
  • [13] Evarist Giné and Joel Zinn. Some limit theorems for empirical processes., The Annals of Probability, 12(4):929–989, 1984.
  • [14] Adam B Kashlak, John A D Aston, and Richard Nickl. Inference on covariance operators via concentration inequalities: k-sample tests, classification, and clustering via Rademacher complexities., arXiv preprint arXiv :1604.06310, 2017.
  • [15] Gerard Kerkyacharian, Richard Nickl, and Dominique Picard. Concentration inequalities and confidence bands for needlet density estimators on compact homogeneous manifolds., Probability Theory and Related Fields, 153(1–2):363–404, 2012.
  • [16] Vladimir Koltchinskii. Local rademacher complexities and oracle inequalities in risk minimization., The Annals of Statistics, 34(6) :2593–2656, 2006.
  • [17] Harold W Kuhn. The Hungarian method for the assignment problem., Naval research logistics quarterly, 2(1–2):83–97, 1955.
  • [18] Michel Ledoux and Michel Talagrand., Probability in Banach Spaces: isoperimetry and processes, volume 23. Springer, 1991.
  • [19] Karim Lounici and Richard Nickl. Global uniform risk bounds for wavelet deconvolution estimators., The Annals of Statistics, 39(1):201–231, 2011.
  • [20] Kanti V Mardia. Measures of multivariate skewness and kurtosis with applications., Biometrika, 57(3):519–530, 1970.
  • [21] Kanti V Mardia. Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies., Sankhyā: The Indian Journal of Statistics, Series B, pages 115–128, 1974.
  • [22] Arkadi Nemirovski. Topics in non-parametric., Ecole d’Eté de Probabilités de Saint-Flour, 28:85, 2000.
  • [23] Dmitry Panchenko. Symmetrization approach to concentration inequalities for empirical processes., Annals of Probability, pages 2068–2081, 2003.
  • [24] WanSoo T Rhee and Michel Talagrand. Martingale inequalities and the jackknife estimate of variance., Statistics & probability letters, 4(1):5–6, 1986.
  • [25] J Michael Steele. An Efron-Stein inequality for nonsymmetric statistics., The Annals of Statistics, pages 753–758, 1986.
  • [26] J Michael Steele., Probability theory and combinatorial optimization, volume 69. Siam, 1997.
  • [27] Ilya Tolstikhin, Nikita Zhivotovskiy, and Gilles Blanchard. Permutational rademacher complexity. In, International Conference on Algorithmic Learning Theory, pages 209–223. Springer, 2015.
  • [28] Cédric Villani., Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.