The Annals of Statistics

On methods of sieves and penalization

Xiaotong Shen

Full-text: Open access


We develop a general theory which provides a unified treatment for the asymptotic normality and efficiency of the maximum likelihood estimates (MLE's) in parametric, semiparametric and nonparametric models. We find that the asymptotic behavior of substitution estimates for estimating smooth functionals are essentially governed by two indices: the degree of smoothness of the functional and the local size of the underlying parameter space. We show that when the local size of the parameter space is not very large, the substitution standard (nonsieve), substitution sieve and substitution penalized MLE's are asymptotically efficient in the Fisher sense, under certain stochastic equicontinuity conditions of the log-likelihood. Moreover, when the convergence rate of the estimate is slow, the degree of smoothness of the functional needs to compensate for the slowness of the rate in order to achieve efficiency. When the size of the parameter space is very large, the standard and penalized maximum likelihood procedures may be inefficient, whereas the method of sieves may be able to overcome this difficulty. This phenomenon is particularly manifested when the functional of interest is very smooth, especially in the semiparametric case.

Article information

Ann. Statist., Volume 25, Number 6 (1997), 2555-2591.

First available in Project Euclid: 30 August 2002

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G05: Estimation
Secondary: 62A10

Asymptotic normality efficiency maximum likelihood estimation methods of sieves and penalization constraints substitution nonparametric and semiparametric models


Shen, Xiaotong. On methods of sieves and penalization. Ann. Statist. 25 (1997), no. 6, 2555--2591. doi:10.1214/aos/1030741085.

Export citation


  • Bahadur, R. R. (1964). On Fisher's bound for asy mptotic variances. Ann. Math. Statist. 35 1545- 1552.
  • Bahadur, R. R. (1967). Rate of convergence of estimates and test statistics. Ann. Math. Statist. 38 303-324.
  • Begun, J., Hall, W., Huang, W. and Wellner, J. (1983). Information and asy mptotic efficiency in parametric-nonparametric models. Ann. Statist. 11 432-452.
  • Bickel, P. J. (1982). On adaptive estimation. Ann. Statist. 10 647-671.
  • Bickel, P. J., Klassen, C. A. J., Ritov, Y. and Wellner, J. A. (1994). Efficient and Adaptive Inference in Semi-parametric Models. Johns Hopkins Univ.
  • Bickel, P. J. and Ritov, Y. (1988). Estimating integrated squared density derivatives: sharp best order of convergence estimates. Sankhy¯a Ser. A 50 381-393.
  • Birg´e, L. and Massart, P. (1993). Rates of convergence for minimum contrast estimators. Probab. Theory Related Fields 97 113-150.
  • Birg´e, L. and Massart, P. (1994). Minimum contrast estimators on sieves. Unpublished manuscript.
  • Cram´er, H. (1946). Mathematical Methods of Statistics. Princeton Univ. Press.
  • Devore, R. and Lorentz, G. (1991). Constructive Approximation. Springer, New York.
  • Gabushin, V. N. (1967). Inequalities for norms of functions and their derivatives in the Lp metric. Mat. Zametki 1 291-298.
  • Grenander, U. (1981). Abstract Inference. Wiley, New York.
  • H´ajek, J. (1970). A characterisation of limiting distributions of regular estimates. Z. Wahrsch. Verw. Gebiete 14 323-330.
  • Ibragimov, I. A. and Has'minskii, R. Z. (1981). Statistical Estimation. Springer, New York.
  • Ibragimov, I. A. and Has'minskii, R. Z. (1991). Asy mptotically normal families of distributions and efficient estimation. Ann. Statist. 19 1681-1724.
  • Koenker, R. and Bassett, F. (1978). Regression quantiles. Econometrica 46 33-50.
  • Kolmogorov, A. N. and Tihomirov, V. M. (1961). -entropy and -capacity of sets in function spaces. Amer. Math. Soc. Transl. 2 227-304.
  • Le Cam, L. (1960). Local asy mptotically normal families of distributions. Univ. California Publ. Statist. 3 37-98.
  • Levit, B. (1974). On optimality of some statistical estimates. In Proceedings of the Prague Sy mposium on Asy mptotic Statistics (J. Hajek, ed.) 2 215-238. Univ. Karlova, Prague.
  • Levit, B. (1978). Infinite dimensional informational inequalities. Theory Probab. Appl. 23 371- 377.
  • Lindsay, B. G. (1980). Nuisance parameters, mixture models and the efficiency of partial likelihood estimators. Philos. Trans. Roy. Soc. London Ser. A 296 639-665.
  • Lorentz, G. (1966). Approximation of Functions. Holt, Reinehart and Winston, New York.
  • Ossiander, M. (1987). A central limit theorem under metric entropy with L2 bracketing. Ann. Probab. 15 897-919.
  • Parzen, M. and Harrington, D. (1993). Proportional odds regression with right-censored data using adaptive integrated splines. Technical report, Dept. Biostatistics, Harvard Univ.
  • Pfanzagl, J. (1982). Contribution to a General Asy mptotic Statistical Theory. Springer, New York.
  • Pollard, D. (1984). Convergence of Stochastic Processes. Springer, New York.
  • Ritov, Y. and Bickel, P. J. (1990). Achieving information bounds in non and semiparametric models. Ann. Statist. 18 925-938.
  • Schoenberg, I. J. (1964). Spline functions and the problem of graduation. Proc. Nat. Acad. Sci. U.S.A. 52 947-950.
  • Severini, T. A. and Wong, W. H. (1992). Profile likelihood and conditionally parametric models. Ann. Statist. 20 1768-1862. Shen, X. (1997a). On the method of penalization. Statist. Sinica. To appear. Shen, X. (1997b). Proportional odds regression and sieve maximum likelihood estimation. Biometrika. To appear.
  • Shen, X. and Wong, W. H. (1994). Convergence rate of sieve estimates. Ann. Statist. 22 580-615.
  • Stone, C. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist. 10 1040-1053.
  • Tikhonov, A. (1963). Solution of incorrectly formulated problems and the regularization method. Soviet. Math. Dokl. 5 1035-1038.
  • Triebel, H. (1983). Theory of Function Spaces. Birkh¨auser, Boston.
  • von Mises, R. (1947). On the asy mptotic distribution of differentiable statistical functions. Ann. Math. Statist. 18 309-348.
  • Wahba, G. (1990). Spline Models for Observational Data. IMS, Hay ward, CA.
  • Whittaker, E. (1923). On new method of graduation. Proc. Edinburgh Math. Soc. 2 41.
  • Wong, W. H. (1992). On asy mptotic efficiency in estimation theory. Statist. Sinica 2 47-68.
  • Wong, W. H. and Severini, T. A. (1991). On maximum likelihood estimation in infinite dimensional parameter space. Ann. Statist. 19 603-632.
  • Wong, W. H. and Shen, X. (1995). A probability inequality for the likelihood surface and convergence rate of the maximum likelihood estimate. Ann. Statist. 23 339-362.
  • Zeidler, E. (1990). Nonlinear Functional Analy sis and Its Applications II/A. Springer, New York.