The Annals of Statistics

Local Extremes, Runs, Strings and Multiresolution

P. L. Davies and A. Kovac

Full-text: Open access

Abstract

The paper considers the problem of nonparametric regression with emphasis on controlling the number of local extremes. Two methods, the run method and the taut-string multiresolution method, are introduced and analyzed on standard test beds. It is shown that the number and locations of local extreme values are consistently estimated. Rates of convergence are proved for both methods. The run method converges slowly but can withstand blocks as well as a high proportion of isolated outliers. The rate of convergence of the taut-string multiresolution method is almost optimal. The method is extremely sensitive and can detect very low power peaks.

Section 1 contains an introduction with special reference to the number of local extreme values. The run method is described in Section 2 and the taut-string-multiresolution method in Section 3. Low power peaks are considered in Section 4. Section contains a comparison with other methods and Section 6 a short conclusion. The proofs are given in Section 7 and the taut-string algorithm is described in the Appendix.

Article information

Source
Ann. Statist. Volume 29, Number 1 (2001), 1-65.

Dates
First available in Project Euclid: 5 August 2001

Permanent link to this document
https://projecteuclid.org/euclid.aos/996986501

Digital Object Identifier
doi:10.1214/aos/996986501

Mathematical Reviews number (MathSciNet)
MR1833958

Zentralblatt MATH identifier
1029.62038

Subjects
Primary: 62G07: Density estimation
Secondary: 65D10: Smoothing, curve fitting 62G20: Asymptotic properties

Keywords
Nonparametric regression local extremes runs strings multiresolution analysis asymptotics outliers low power peaks

Citation

Davies, P. L.; Kovac, A. Local Extremes, Runs, Strings and Multiresolution. Ann. Statist. 29 (2001), no. 1, 1--65. doi:10.1214/aos/996986501. https://projecteuclid.org/euclid.aos/996986501.


Export citation

References

  • Barlow, R., Bartholomew, D., Bremner, J. and Brunk, H. (1972). Statistical Inference under Order Restrictions. Wiley, New York.
  • Chaudhuri, P. and Marron, J. S. (1999). Sizer for exploration of structures in curves. J. Amer. Statist. Assoc. 94 807-823.
  • Davies, P. (1995). Data features. Statist. Neerlandica 49 185-245.
  • Davies, P. L. (2000). Hidden periodicities and strings. Dept. Mathematics and Computuer Science, Univ. Essen, Germany.
  • Davies, P. L. and L ¨owendick, M. (1999). On smoothing under bounds and geometric constraints. Technical report, Univ. Essen.
  • Delecroix, M., Simioni, M. and Thomas-Agnan, C. (1995). A shape constrained smoother: simulation study. Comput. Statist. 10 155-175.
  • Donoho, D. (1988). One-sided inference about functionals of a density. Ann. Statist. 16 1390-1420.
  • Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrinkage: asymptopia? J. Roy. Statist. Soc. 57 371-394. D ¨umbgen, L. (1998a). Application of local rank tests to nonparametric regression. Medical Univ., L ¨ubeck. D ¨umbgen, L. (1998b). New goodness-of-fit tests and their application to nonparametric confidence sets. Ann. Statist. 26 288-314.
  • Fan, J. and Gijbels, I. (1995). Data-driven bandwidth selection in local polynomial fitting: variable bandwidth and spatial adaption. J. Roy. Statist. Soc. Ser. B 57 371-394.
  • Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman and Hall, London.
  • Feller, W. (1968). An Introduction to ProbabilityTheoryand Its Applications 1, 3rd ed. Wiley, New York.
  • Fisher, N. I., Mammen, E. and Marron, J. S. (1994). Testing for multimodality. Comput. Statist. Data Anal. 18 499-512.
  • Freedman, D. (1971). Browniam Motion and Diffusion, 3rd. ed. Holden-Day, San Francisco.
  • Good, I. and Gaskins, R. (1980). Density estimating and bump-hunting by the penalized likelihood method exemplified by scattering and meteorite data. J. Amer. Statist. Assoc. 75 42-73.
  • Green, P. and Silverman, B. (1994). Nonparametric Regression and Generalized Linear Models: a Roughness PenaltyApproach. Chapman and Hall, London.
  • Groeneboom, P. (1985). Esimating a monotone density. In Proceedings of the BerkeleyConference in Honor of JerzyNeyman and Jack Kiefer 2 (L. Le Cam and R. Olshen, eds.) Wadsworth, Monterey, CA.
  • Hartigan, J. A. (1987). Esimation of a convexdensity contour in two dimensions. J. Amer. Statist. Association 82 267-270.
  • Hartigan, J. A. and Hartigan, P. (1985). The dip test of unimodality. Ann. Statist. 13 70-84.
  • Hengartner, N. and Stark, P. (1995). Finite-sample confidence envelopes for shape-restricted densities. Ann. Statist. 23 525-550.
  • Ibragimov, I. and Khas'minskii, R. (1980). On nonparametric estimation of regression. Soviet Math. Dockl. 21 810-814.
  • Kahane, J.-P. (1968). Some Random Series of Functions. Heath, Lexington, MA.
  • Khas'minskii, R. (1978). A lower bound on the risks of non-parametric estimates of densities in the uniform metric. TheoryProbab. Appl. 23 794-798.
  • Kovac, A. and Silverman, B. W. (2000). Extending the scope of wavelet regression methods by coefficient-dependent thresholding. J. Amer. Statist. Assoc. 95 172-183.
  • Leurgans, S. (1982). Asymptotic distributions of slope-of-greatest-convex-minorant estimators. Ann. Statist. 10 287-296.
  • Majidi, A. (2000). Nichtparametrische regression unter modalit¨atskontrolle. Diplomarbeit, Fachbereich Mathematik und Informatik, Univ. Essen.
  • Mammen, E. (1991). Nonparametric regression under qualitative smoothness assumptions. Ann. Statist. 19 741-759.
  • Mammen, E., Marron, J., Turlach, B. and Wand, M. (1998). A general framework for constrained smoothing. Unpublished manuscript.
  • Mammen, E. and Thomas-Agnan, C. (1998). Smoothing splines and shape restrictions. Unpublished manuscript.
  • Mammen, E. and van de Geer, S. (1997). Locally adaptive regression splines. Ann. Statist. 25 387-413.
  • M¨achler, M. (1995). Variational solution of penalized likelihood problems and smooth curve estimation. Ann. Statist. 23 1496-1517.
  • Metzner, L. (1997). Facettierte nichtparametrische Regression. Ph.D. thesis, Univ. Essen, Germany.
  • Minotte, M. C. (1997). Nonparametric testing of the existence of modes. Ann. Statist. 25 1646-1660.
  • Minotte, M. C. and Scott, D. W. (1993). The mode tree: a tool for visualization of nonparametric density features. J. Comput. Graph. Statist. 2 51-68.
  • Morgenthaler, S. and Tukey, J. (1991). Configural Polysampling: A Route to Practical Robustness. Wiley, New York.
  • M ¨uller, D. and Sawitzki, G. (1991). Excess mass estimates and tests of multimodality. J. Amer. Statist. Assoc. 86 738-746.
  • Nadaraya, E. A. (1964). On estimating regression. TheoryProbab. Appl. 10 186-190. Polonik, W. (1995a). Density estimation under qualitive assumptions in higher dimensions. J. Mulitivariate Anal. 55 61-81. Polonik, W. (1995b). Measuring mass concentrations and estimating density contour clusters: an excess mass approach. Ann. Statist. 23 855-881.
  • Polonik, W. (1999). Concentration and goodness of fit in higher dimensions: (aymptotically) distribution-free methods. Ann. Statist. 27 1210-1229.
  • Polzehl, J. and Spokoiny, G. (2000). Adaptive weights smoothing with applications to image restoration. J. Roy. Statist. Soc. Ser. B 62 335-354.
  • Ramsay, J. (1998). Estimating smooth monotone functions. J. Roy. Statist. Soc. 60 365-375.
  • Robertson, T. (1967). On estimating a density measurable with respect to a -lattice. Ann. Math. Statist. 38 482-493.
  • Robertson, T., Wright, F. T. and Dykstra, R. L. (1988). Order Restricted Statistical Inference. Wiley, New York.
  • Sager, T. W. (1979). An iterative method for estimating a multivariate mode and isopleth. J. Amer. Statist. Assoc. 74 329-339.
  • Sager, T. W. (1982). Nonparametric maximum likelihood estimation of spatial patterns. Ann. Statist. 10 1125-1136.
  • Sager, T. W. (1986). An application of isotonic regression to multivariate density estimation. In Advances in Order Restricted Statistical Inference (R. L. Dykstra, T. Robertson and F. T. Wright, eds.) Springer, New York.
  • Silverman, B. W. (1985). Some aspects of the spline smoothing approach to non-parametric regression curve fitting. J. Roy. Statist. Soc. 47 1-52.
  • Silverman, B. W. (1986). DensityEstimation for Statistics and Data Analysis. Chapman and Hall, London.
  • Stone, C. (1982). Optimal rates of convergence for nonparametric regression. Ann. Statist. 10 1040-1053.
  • Tukey, J. (1993). Exploratory analysis of variance as providing examples of strategic choices. In New Directions in Statistical Data Analysis and Robustness (S. Morgenthaler, E. Ronchetti and W. A. Stahel, eds.) Birkh¨auser, Basel.
  • Watson, G. S. (1964). Smooth regression analysis. Sankhy ¯a 26 101-116.
  • Wegman, E. J. (1970). Maximum likelihood estimation of a unimodal density. Ann. Math. Statist. 41 457-471.
  • by Stein (1956). This naive estimator can have relatively high risk under a probability model. Once we have devised a more efficient estimator of µ, the second problem is interpolation among its components so as to estimate the function f. This is essentially a problem in approximation theory and is considerably more sensitive to assumptions on the nature of f than is the estimation of the f ti. Because the data will not tell us how many derivatives f has, we might settle, in the absence of strong prior information, for linear or spline interpolation among the estimated components of µ. Instead, the Davies and Kovac estimators of f simultaneously determine the estimator of µ and the interpolation scheme by minimizing the number of local extrema in fn, subject to achieving residuals at the ti that behave like white noise. Their idea is refreshingly novel. Comparing the performance of their estimators with linearly interpolated thresholding competitors when f is very wiggly seems to be a natural question. To consider separately the estimation and interpolation aspects of nonparametric regression clarifies what we can achieve in each respect. In particular, handling the practically important case where the ti are not all distinct can begin with treating an unbalanced one-way layout.
  • Beran, R. (2000). REACT scatterplot smoothers: superefficiency through basis economy. J. Amer. Statist. Assoc. 95 155-171.
  • Buckheit, J. B. and Donoho, D. L. (1995). WaveLab and reproducible research. Technical report 474, Dept. Statistics, Stanford Univ. Available at www-stat.stanford.edu/ donoho/.
  • McCluhan, E. and Zingrone, F. (1995). Essential McLuhan. House of Anansi Press, Concord Ontario, Canada.
  • Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Third BerkeleySymp. Math. Statist. Probab. 1 197-208. Univ. California Press, Berkeley.
  • Stein, C. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9 1135-1151.
  • Tukey, J. W. (1977). ExploratoryData Analysis. Addison-Wesley, Reading, MA.
  • D ¨umbgen, L. and Johns, R. B. (2000). Confidence bands for isotonic median curves via sign tests. Preprint.
  • Donoho, O. (1988). One-sided inference about functionals of a density. Ann. Statist 16 1390-1420.
  • and Hartigan (1981). Some work on stretched strings for multimodal density
  • estimation appears in Hartigan (2000). I believe the stretched string fit of a function f ti to data y ti is best expressed by finding the function f that minimizes the criterion i y ti f ti 2 + n i=2 f ti f ti-1 Perhaps the penalty function could be the weighted function
  • Kennedy, D. P. (1976). The distribution of the Maximum Brownian Excursion. J. Appl. Probab. 13 371-376.
  • Hartigan, J. A. (2000). Testing for antimodes. In Data Analysis, Scientific Modeling and Practical Application (W. Gaul, O. Opitz and M. Schader, eds.) 169-181. Springer, New York.
  • Hartigan, J. A. and Hartigan, P. (1985). The dip test of unimodality. Ann. Statist. 13 70-84.
  • in Sprague (1887). In the analysis of mortality tables he defines smoothness (or in the terminology of the paper of P. L. Davies and A. Kovac, simplicity) of a function as the number of modes of the first derivative (or equivalently, the number of concave or convexpieces of the function). For more history on smoothing (in actuarial sciences) and this notion of Sprague, see also Diewert
  • and Wales (1993). By the way, Sprague was also one of the first researchers who considered smoothing as construction of roads through rough terrain. Removing hills and valleys as done in restricting the number of modes could also be motivated by this image. This is nicely illustrated in this paper [see also Mammen and van de Geer (1997)]: taut strings are constructed by moving earth from (local) hills to valleys. Also other measures of smoothness implicitly used in smoothing methods could be used in the approach of Davies and Kovac. The notion of simplicity that is used should depend on the aims of the statistical application.
  • can be found in Mammen and van de Geer (1997). A related approach has
  • and Portnoy (1994). For asymptotic theory see also Portnoy (1997) and van de
  • Geer (2000). Using this estimate in the approach of Davies and Kovac would be more satisfactory than Sprague's method. He proposed just drawing curves by hand.
  • Diewert, W. E. and Wales, T. J. (1993). A "new" approach to the smoothing problem. Technical report.
  • Koenker, R., Ng, P. T. and Portnoy, S. (1994). Quantile smoothing splines. Biometrika 81 673-680.
  • Kuensch, H. R. (1994). Robust priors for smoothing and image restoration. Ann. Inst. Statist. Math. 46 1-19.
  • Mammon, E. and van de Geer, S. (1997). Locally adaptive regression splines. Ann. Statist 25 387-413.
  • Portnoy, S. (1997). Local asymptotics for quantile smoothing splines. Ann. Statist. 25 414-434.
  • Sprague, T. B. (1887). The graphic method of adjusting mortality tables (with discussion). J. Inst. Actuaries 26 270-285.
  • Van de Geer, S. (2000). M-estimation using penalties or sieves. J. Statist Plann. Inference. To appear.
  • and used for statistical inference in Chaudhuri and Marron (1999). Assuming the existence of a true underlying curve for a moment, an important part of the scale space view is that there is not enough information in the data to completely estimate the true curve. Hence, one should instead focus on other "targets" which reflect what aspects of the true curve are obtainable from the data. This seems to be a useful position that lies in between the two sides of the debate as to whether or not a "true underlying curve exists." The specifics on the interesting new approaches to curve estimation are connected to some ideas of Tukey. The "Tukey terminology" can be carried one
  • Chaudhuri, P. and Marron, J. S. (1999). Sizer for exploration of structure in curves. J. Amer. Statist. Assoc. 94 807-823.
  • Chaudhuri, P. and Marron, J. S. (2000). Scale space view of curve estimation. Ann. Statist. 28 408-428.
  • Green, P. and Silverman, B. (1994). Nonparametric Regression and Generalized Linear Models: a Roughness PenaltyApproach. Chapman and Hall, London.
  • Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.
  • IT WOULD INDEED BE CARRIED THROUGH, which may or may not be possible, and (b) ITS ASSUMPTIONS WERE SECURELY
  • CORRECT, which they never are. One but also with an unmentioned (unmentionable) logical gap between the assumptions and the data.
  • Albert, D. (1992). Quantum Mechanics and Experience. Harvard Univ. Press.
  • Bell, J. (1987). Speakable and Unspeakable in Quantum Mechanics. Cambridge Univ. Press.
  • Berndl, K., Daumer, M., D ¨urr, D., Goldstein, S. and Zanghi, N. (1995). A survey of Bohmian mechanics. Nuovo Cimento B 110 737-750.
  • Davies, P. L. (1995). Date features. Statist. Neerlandica 49 184-245. Goldstein, S. (1998a). Quantum theory without observers I. Physics Today 42-46. Goldstein, S. (1998b). Quantum theory without observers II. Physics Today 38-42.
  • Kac, M. (1959). Statistical Independence in Probability, Analysis and Number Theory. Wiley, New York.
  • Mammen, E. and van de Geer, S. (1997). Locally adaptive regression splines. Ann. Statist. 25 387-413. Tukey, J. (1993a). Exploratory analysis of variance as providing examples of strategic choices. In New Directions in Statistical Data Analysis and Robustness (S. Morgenthaler, E. Ronchetti and W. A. Stahel, eds.) Birkh¨auser, Basel. Tukey, J. (1993b). Issues relevant to an honest account of data-based inference, partially in the light of Laurie Davies' paper. Princeton Univ.