## Electronic Journal of Statistics

### Mass volume curves and anomaly ranking

#### Abstract

This paper aims at formulating the issue of ranking multivariate unlabeled observations depending on their degree of abnormality as an unsupervised statistical learning task. In the 1-d situation, this problem is usually tackled by means of tail estimation techniques: univariate observations are viewed as all the more ‘abnormal’ as they are located far in the tail(s) of the underlying probability distribution. It would be desirable as well to dispose of a scalar valued ‘scoring’ function allowing for comparing the degree of abnormality of multivariate observations. Here we formulate the issue of scoring anomalies as a M-estimation problem by means of a novel functional performance criterion, referred to as the Mass Volume curve (MV curve in short), whose optimal elements are strictly increasing transforms of the density almost everywhere on the support of the density. We first study the statistical estimation of the MV curve of a given scoring function and we provide a strategy to build confidence regions using a smoothed bootstrap approach. Optimization of this functional criterion over the set of piecewise constant scoring functions is next tackled. This boils down to estimating a sequence of empirical minimum volume sets whose levels are chosen adaptively from the data, so as to adjust to the variations of the optimal MV curve, while controlling the bias of its approximation by a stepwise curve. Generalization bounds are then established for the difference in sup norm between the MV curve of the empirical scoring function thus obtained and the optimal MV curve.

#### Article information

Source
Electron. J. Statist., Volume 12, Number 2 (2018), 2806-2872.

Dates
First available in Project Euclid: 18 September 2018

https://projecteuclid.org/euclid.ejs/1537257627

Digital Object Identifier
doi:10.1214/18-EJS1474

#### Citation

Clémençon, Stephan; Thomas, Albert. Mass volume curves and anomaly ranking. Electron. J. Statist. 12 (2018), no. 2, 2806--2872. doi:10.1214/18-EJS1474. https://projecteuclid.org/euclid.ejs/1537257627

#### References

• Cadre, B. (2006). Kernel Estimation Of Density Level Sets., Journal of Multivariate Analysis 97 999–1023.
• Cadre, B., Pelletier, B. and Pudlo, P. (2013). Estimation of density level sets with a given probability content., Journal of Nonparametric Statistics 25 261–272.
• Cavalier, L. (1997). Nonparametric Estimation of Regression Level Sets., Statistics 29 131–160.
• Clémençon, S. and Jakubowicz, J. (2013). Scoring Anomalies: a M-estimation formulation. In, Proceedings of the 16-th International Conference on Artificial Intelligence and Statistics, Scottsdale, USA.
• Clémençon, S. and Robbiano, S. (2014). Anomaly ranking as supervised bipartite ranking. In, Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China 343–351.
• Clémençon, S. and Vayatis, N. (2009). Adaptive Estimation of the Optimal ROC Curve and a Bipartite Ranking Algorithm. In, Algorithmic Learning Theory. Lecture Notes in Computer Science 5809 216–231. Springer Berlin Heidelberg.
• Csörgő, M. (1983)., Quantile Processes with Statistical Applications. Society for Industrial and Applied Mathematics.
• Csörgő, M. and Révész, P. (1978). Strong Approximations of the Quantile Process., The Annals of Statistics 6 882–894.
• Csörgő, M. and Révész, P. (1981)., Strong Approximations in Probability and Statistics. Academic Press.
• DeVore, R. (1987). A note on adaptive approximation., Approx. Theory Appl. 3 74–78.
• DeVore, R. A. (1998). Nonlinear Approximation., Acta Numerica 7 51–150.
• Donoho, D. and Gasko, M. (1992). Breakdown properties of location estimates based on half space depth and projected outlyingness., The Annals of Statistics 20 1803–1827.
• Efron, B. (1979). Bootstrap methods: another look at the jacknife., Annals of Statistics 7 1–26.
• Egan, J. P. (1975)., Signal Detection Theory and ROC Analysis. Academic Press.
• Einmahl, J. H. J. and Mason, D. M. (1992). Generalized Quantile Processes., The Annals of Statistics 20 1062–1078.
• Embrechts, P. and Hofert, M. (2013). A note on generalized inverses., Mathematical Methods of Operations Research 77 423–432.
• Falk, M. and Reiss, R. (1989). Weak convergence of smoothed and nonsmoothed bootstrap quantile estimates., Annals of Probability 17 362–371.
• Giné, E. and Guillou, A. (2002). Rates of strong uniform consistency for multivariate kernel density estimators., Ann. Inst. Poincaré (B), Probabilités et Statistiques 38 907–921.
• Hall, P. (1986). On the number of bootstrap simulations required to construct a confidence interval., Annals of Statistics 14 1453–1462.
• Koltchinskii, V. (1997). M-estimation, convexity and quantiles., The Annals of Statistics 25 435–477.
• Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization (with discussion)., The Annals of Statistics 34 2593–2706.
• Lifshits, M. A. (1987). On the distribution of the maximum of a Gaussian process., Theory of Probability and its Applications 31 125–132.
• Liu, R. Y., Parelius, J. M. and Singh, K. (1999). Multivariate analysis by data depth: descriptive statistics, graphics and inference., Ann. Statist. 27 783–858.
• Lovász, L. and Vempala, S. (2006). Simulated annealing in convex bodies and an $O(n^4)$ volume algorithm., Journal of Computer and System Sciences 72 392–417.
• Mallat, S. (1990)., A Wavelet Tour of Signal Processing. Academic Press.
• Massart, P. (1990). The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality., Ann. Probab. 18 1269–1283.
• Muller, D. W. and Sawitzki, G. (1991). Excess Mass Estimates and Tests for Multimodality., Journal of the American Statistical Association 86 738–746.
• Pitt, L. D. and Tran, L. T. (1979). Local Sample Path Properties of Gaussian Fields., Ann. Probab. 7 477–493.
• Polonik, W. (1995). Measuring Mass Concentrations And Estimating Density Contour Clusters – An Excess Mass Approach., The Annals of Statistics 23 855–881.
• Polonik, W. (1997). Minimum volume sets and generalized quantile processes., Stochastic Processes and their Applications 69 1–24.
• Polonik, W. (1999). Concentration and goodness-of-fit in higher dimensions: (asymptotically) distribution-free methods., The Annals of Statistics 27 1210–1229.
• Rigollet, P. and Vert, R. (2009). Fast rates for plug-in estimators of density level sets., Bernoulli 14 1154–1178.
• Sargan, J. D. and Mikhail, W. M. (1971). A General Approximation to the Distribution of Instrumental Variables Estimates., Econometrica 39 131–169.
• Scott, C. and Nowak, R. (2006). Learning Minimum Volume Sets., Journal of Machine Learning Research 7 665–704.
• Silverman, B. and Young, G. (1987). The bootstrap: to smooth or not to smooth., Biometrika 7 469–479.
• Steinwart, I., Hush, D. and Scovel, C. (2005). A classification framework for anomaly detection., J. Machine Learning Research 6 211–232.
• Stute, W. (1982). A Law of the Logarithm for Kernel Density Estimators., The Annals of Probability 10 414–422.
• Tsirel’son, V. S. (1976). The Density of the Distribution of the Maximum of a Gaussian Process., Theory of Probability & Its Applications 20 847–856.
• Tsybakov, A. (1997). On nonparametric estimation of density level sets., Annals of Statistics 25 948–969.
• Tukey, J. (1975). Mathematics and picturing data. (R. D. James, ed.) 523–531. Canadian Math., Congress.
• Viswanathan, K., Choudur, L., Talwar, V., Wang, C., Macdonald, G. and Satterfield, W. (2012). Ranking Anomalies in Data Centers. In, Network Operations and System Management (R. D. James, ed.) 79–87. IEEE.
• Wand, M. P. and Jones, M. C. (1994)., Kernel Smoothing. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. Taylor & Francis.
• Zuo, B. Y. and Serfling, R. (2000). General notions of statistical depth function., The Annals of Statistics 28 461–482.