## Bernoulli

• Bernoulli
• Volume 22, Number 2 (2016), 1131-1183.

### Analysis of the Forward Search using some new results for martingales and empirical processes

#### Abstract

The Forward Search is an iterative algorithm for avoiding outliers in a regression analysis suggested by Hadi and Simonoff (J. Amer. Statist. Assoc. 88 (1993) 1264–1272), see also Atkinson and Riani (Robust Diagnostic Regression Analysis (2000) Springer). The algorithm constructs subsets of “good” observations so that the size of the subsets increases as the algorithm progresses. It results in a sequence of regression estimators and forward residuals. Outliers are detected by monitoring the sequence of forward residuals. We show that the sequences of regression estimators and forward residuals converge to Gaussian processes. The proof involves a new iterated martingale inequality, a theory for a new class of weighted and marked empirical processes, the corresponding quantile process theory, and a fixed point argument to describe the iterative aspect of the procedure.

#### Article information

Source
Bernoulli, Volume 22, Number 2 (2016), 1131-1183.

Dates
Revised: November 2014
First available in Project Euclid: 9 November 2015

https://projecteuclid.org/euclid.bj/1447077772

Digital Object Identifier
doi:10.3150/14-BEJ689

Mathematical Reviews number (MathSciNet)
MR3449811

Zentralblatt MATH identifier
06562308

#### Citation

Johansen, Søren; Nielsen, Bent. Analysis of the Forward Search using some new results for martingales and empirical processes. Bernoulli 22 (2016), no. 2, 1131--1183. doi:10.3150/14-BEJ689. https://projecteuclid.org/euclid.bj/1447077772

#### References

• [1] Aroian, L. (1941). A study of R.A. Fisher’s $z$ distribution and the related $F$ distribution. Ann. Math. Statist. 12 429–448.
• [2] Atkinson, A. and Riani, M. (2000). Robust Diagnostic Regression Analysis. New York: Springer.
• [3] Atkinson, A.C. (1994). Fast very robust methods for detection of multiple outliers. J. Amer. Statist. Assoc. 89 1329–1339.
• [4] Atkinson, A.C. and Riani, M. (2006). Distribution theory and simulations for tests of outliers in regression. J. Comput. Graph. Statist. 15 460–476.
• [5] Atkinson, A.C., Riani, M. and Cerioli, A. (2010). The Forward Search: Theory and data analysis (with discussion). J. Korean Statist. Soc. 39 117–134.
• [6] Atkinson, A.C., Riani, M. and Cerioli, A. (2010). Rejoinder: The Forward Search: Theory and data analysis. J. Korean Statist. Soc. 39 161–163.
• [7] Bahadur, R.R. (1966). A note on quantiles in large samples. Ann. Math. Statist. 37 577–580.
• [8] Bellini, T. (2015). The forward search interactive outlier detection in cointegrated VAR analysis. Adv. Data Anal. Classif. To appear.
• [9] Bercu, B. and Touati, A. (2008). Exponential inequalities for self-normalized martingales with applications. Ann. Appl. Probab. 18 1848–1869.
• [10] Bickel, P.J. (1975). One-step Huber estimates in the linear model. J. Amer. Statist. Assoc. 70 428–434.
• [11] Billingsley, P. (1999). Convergence of Probability Measures, 2nd ed. New York: Wiley.
• [12] Cavaliere, G. and Georgiev, I. (2013). Exploiting infinite variance through dummy variables in nonstationary autoregressions. Econometric Theory 29 1162–1195.
• [13] Cerioli, A., Farcomeni, A. and Riani, M. (2014). Strong consistency and robustness of the Forward Search estimator of multivariate location and scatter. J. Multivariate Anal. 126 167–183.
• [14] Csörgő, M. (1983). Quantile Processes with Statistical Applications. CBMS-NSF Regional Conference Series in Applied Mathematics 42. Philadelphia, PA: SIAM.
• [15] Dollinger, M.B. and Staudte, R.G. (1991). Influence functions of iteratively reweighted least squares estimators. J. Amer. Statist. Assoc. 86 709–716.
• [16] Engler, E. and Nielsen, B. (2009). The empirical process of autoregressive residuals. Econom. J. 12 367–381.
• [17] Guenther, W.C. (1977). An easy method for obtaining percentage points of order statistics. Technometrics 19 319–321.
• [18] Hadi, A.S. (1992). Identifying multiple outliers in multivariate data. J. R. Stat. Soc. Ser. B. Stat. Methodol. 54 761–771.
• [19] Hadi, A.S. and Simonoff, J.S. (1993). Procedures for the identification of multiple outliers in linear models. J. Amer. Statist. Assoc. 88 1264–1272.
• [20] Hawkins, D.M. and Olive, D.J. (2002). Inconsistency of resampling algorithms for high-breakdown regression estimators and a new algorithm. J. Amer. Statist. Assoc. 97 136–159.
• [21] Helland, I.S. (1982). Central limit theorems for martingales with discrete or continuous time. Scand. J. Stat. 9 79–94.
• [22] Johansen, S. and Nielsen, B. (2009). An analysis of the indicator saturation estimator as a robust regression estimator. In The Methodology and Practice of Econometrics (J.L. Castle and N. Shephard, eds.) 1–36. Oxford: Oxford Univ. Press.
• [23] Johansen, S. and Nielsen, B. (2010). Discussion: The Forward Search: Theory and data analysis. J. Korean Statist. Soc. 39 137–145.
• [24] Johansen, S. and Nielsen, B. (2013). Outlier detection in regression using an iterated one-step approximation to the Huber-skip estimator. Econometrics 1 53–70.
• [25] Johansen, S. and Nielsen, B. (2015). Asymptotic theory of M-estimators in linear time series regression models. Discussion paper, Univ. Copenhagen.
• [26] Johansen, S. and Nielsen, B. (2015). Asymptotic theory of outlier detection algorithms for linear time series regression models. Scand. J. Stat. To appear.
• [27] Kiefer, J. (1967). On Bahadur’s representation of sample quantiles. Ann. Math. Statist. 38 1323–1342.
• [28] Koul, H.L. and Ossiander, M. (1994). Weak convergence of randomly weighted dependent residual empiricals with applications to autoregression. Ann. Statist. 22 540–562.
• [29] Lee, S. and Wei, C.-Z. (1999). On residual empirical processes of stochastic regression models with applications to time series. Ann. Statist. 27 237–261.
• [30] Nielsen, B. (2014). ForwardSearch. R package version 1. Available at http://www.R-project.org.
• [31] R Development Core Team (2011). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available at http://www.R-project.org.
• [32] Revuz, D. and Yor, M. (1998). Continuous Martingales and Brownian Motion, 3rd ed. Berlin: Springer.
• [33] Riani, M. and Atkinson, A.C. (2007). Fast calibrations of the Forward Search for testing multiple outliers in regression. Adv. Data Anal. Classif. 1 123–141.
• [34] Riani, M., Atkinson, A.C. and Cerioli, A. (2009). Finding an unknown number of multivariate outliers. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 447–466.
• [35] Rousseeuw, P.J. (1984). Least median of squares regression. J. Amer. Statist. Assoc. 79 871–880.
• [36] Rousseeuw, P.J. and Leroy, A.M. (1987). Robust Regression and Outlier Detection. New York: Wiley.
• [37] Ruppert, D. and Carroll, R.J. (1980). Trimmed least squares estimation in the linear model. J. Amer. Statist. Assoc. 75 828–838.
• [38] Sampford, M.R. (1953). Some inequalities on Mill’s ratio and related functions. Ann. Math. Statist. 24 130–132.
• [39] Shorack, G.R. (1979). Weak convergence of empirical and quantile processes in sup-norm metrics via KMT-constructions. Stochastic Process. Appl. 9 95–98.
• [40] Simpson, D.G., Ruppert, D. and Carroll, R.J. (1992). On one-step GM estimates and stability of inferences in linear regression. J. Amer. Statist. Assoc. 87 439–450.
• [41] Soms, A.P. (1976). An asymptotic expansion for the tail area of the $t$-distribution. J. Amer. Statist. Assoc. 71 728–730.
• [42] Víšek, J.Á. (2006). The least trimmed squares. Part I: Consistency. Kybernetika (Prague) 42 1–36.
• [43] Víšek, J.Á. (2006). The least trimmed squares. Part II: $\sqrt{n}$-consistency. Kybernetika (Prague) 42 181–202.
• [44] Víšek, J.Á. (2006). The least trimmed squares. Part III: Asymptotic normality. Kybernetika (Prague) 42 203–224.
• [45] Welsh, A.H. and Ronchetti, E. (2002). A journey in single steps: Robust one-step $M$-estimation in linear regression. J. Statist. Plann. Inference 103 287–310.