• Bernoulli
  • Volume 19, Number 5A (2013), 1839-1854.

Optimal variance estimation without estimating the mean function

Tiejun Tong, Yanyuan Ma, and Yuedong Wang

Full-text: Open access


We study the least squares estimator in the residual variance estimation context. We show that the mean squared differences of paired observations are asymptotically normally distributed. We further establish that, by regressing the mean squared differences of these paired observations on the squared distances between paired covariates via a simple least squares procedure, the resulting variance estimator is not only asymptotically normal and root-$n$ consistent, but also reaches the optimal bound in terms of estimation variance. We also demonstrate the advantage of the least squares estimator in comparison with existing methods in terms of the second order asymptotic properties.

Article information

Bernoulli, Volume 19, Number 5A (2013), 1839-1854.

First available in Project Euclid: 5 November 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

asymptotic normality difference-based estimator generalized least squares nonparametric regression optimal bound residual variance


Tong, Tiejun; Ma, Yanyuan; Wang, Yuedong. Optimal variance estimation without estimating the mean function. Bernoulli 19 (2013), no. 5A, 1839--1854. doi:10.3150/12-BEJ432.

Export citation


  • [1] Box, G.E.P. (1954). Some theorems on quadratic forms applied in the study of analysis of variance problems. I. Effect of inequality of variance in the one-way classification. Ann. Math. Statist. 25 290–302.
  • [2] Brockwell, P.J. and Davis, R.A. (1991). Time Series: Theory and Methods, 2nd ed. New York: Springer.
  • [3] Dette, H., Munk, A. and Wagner, T. (1998). Estimating the variance in nonparametric regression – what is a reasonable choice? J. R. Stat. Soc. Ser. B Stat. Methodol. 60 751–764.
  • [4] Eubank, R.L. and Spiegelman, C.H. (1990). Testing the goodness of fit of a linear model via nonparametric regression techniques. J. Amer. Statist. Assoc. 85 387–392.
  • [5] Gasser, T., Kneip, A. and Köhler, W. (1991). A flexible and fast method for automatic smoothing. J. Amer. Statist. Assoc. 86 643–652.
  • [6] Gasser, T., Sroka, L. and Jennen-Steinmetz, C. (1986). Residual variance and residual pattern in nonlinear regression. Biometrika 73 625–633.
  • [7] Gu, C. and Wahba, G. (1993). Semiparametric analysis of variance with tensor product thin plate splines. J. Roy. Statist. Soc. Ser. B 55 353–368.
  • [8] Hall, P., Kay, J.W. and Titterington, D.M. (1990). Asymptotically optimal difference-based estimation of variance in nonparametric regression. Biometrika 77 521–528.
  • [9] Kariya, T. and Kurata, H. (2004). Generalized Least Squares. Chichester: Wiley.
  • [10] McElroy, F.W. (1967). A necessary and sufficient condition that ordinary least-squares estimators be best linear unbiased. J. Amer. Statist. Assoc. 62 1302–1304.
  • [11] Müller, H.G. and Stadtmüller, U. (1999). Discontinuous versus smooth regression. Ann. Statist. 27 299–337.
  • [12] Müller, U.U., Schick, A. and Wefelmeyer, W. (2003). Estimating the error variance in nonparametric regression by a covariate-matched $U$-statistic. Statistics 37 179–188.
  • [13] Rice, J. (1984). Bandwidth choice for nonparametric regression. Ann. Statist. 12 1215–1230.
  • [14] Rotar, V.I. (1973). Certain limit theorems for polynomials of degree two. Teor. Verojatnost. i Primenen. 18 527–534.
  • [15] Tong, T. and Wang, Y. (2005). Estimating residual variance in nonparametric regression using least squares. Biometrika 92 821–830.
  • [16] Tsiatis, A.A. (2006). Semiparametric Theory and Missing Data. New York: Springer.
  • [17] Wang, Y. (2011). Smoothing Splines: Methods and Applications. Monographs on Statistics and Applied Probability 121. Boca Raton, FL: CRC Press.
  • [18] Whittle, P. (1964). On the convergence to normality of quadratic forms in independent variables. Teor. Verojatnost. i Primenen. 9 113–118.
  • [19] Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc. 93 120–131.