The Annals of Statistics

Gaussian model selection with an unknown variance

Yannick Baraud, Christophe Giraud, and Sylvie Huet

Full-text: Open access


Let Y be a Gaussian vector whose components are independent with a common unknown variance. We consider the problem of estimating the mean μ of Y by model selection. More precisely, we start with a collection $\mathcal{S}=\{S_{m},m\in\mathcal{M}\}$ of linear subspaces of ℝn and associate to each of these the least-squares estimator of μ on Sm. Then, we use a data driven penalized criterion in order to select one estimator among these. Our first objective is to analyze the performance of estimators associated to classical criteria such as FPE, AIC, BIC and AMDL. Our second objective is to propose better penalties that are versatile enough to take into account both the complexity of the collection $\mathcal{S}$ and the sample size. Then we apply those to solve various statistical problems such as variable selection, change point detections and signal estimation among others. Our results are based on a nonasymptotic risk bound with respect to the Euclidean loss for the selected estimator. Some analogous results are also established for the Kullback loss.

Article information

Ann. Statist., Volume 37, Number 2 (2009), 630-672.

First available in Project Euclid: 10 March 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression

Model selection penalized criterion AIC FPE BIC AMDL variable selection change-points detection adaptive estimation


Baraud, Yannick; Giraud, Christophe; Huet, Sylvie. Gaussian model selection with an unknown variance. Ann. Statist. 37 (2009), no. 2, 630--672. doi:10.1214/07-AOS573.

Export citation


  • Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584–653.
  • Akaike, H. (1969). Statistical predictor identification. Ann. Inst. Statist. Math. 22 203–217.
  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principal. In 2nd International Symposium on Information Theory (B. N. Petrov and F. Csáki, eds.) 267–281. Akadémiai Kiadó, Budapest.
  • Akaike, H. (1978). A Bayesian analysis of the minimum AIC procedure. Ann. Inst. Statist. Math. Part A 30 9–14.
  • Baraud, Y., Huet, S. and Laurent, B. (2003). Adaptive tests of linear hypotheses by model selection. Ann. Statist. 31 225–251.
  • Baraud, Y., Giraud, C. and Huet, S. (2007). Gaussian model selection with unknown variance. Technical report. Available at arXiv:math/0701250.
  • Barron, A. R. (1991). Complexity regularization with applications to artificial neural networks. In Nonparametric Functional Estimation (Roussas G., ed.) 561–576. Kluwer, Dordrecht.
  • Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301–413.
  • Barron, A. R. and Cover, T. M. (1991). Minimum complexity density estimation. IEEE Trans. Inform. Theory 37 1034–1054.
  • Birgé, L. and Massart, P. (1997). From model selection to adaptive estimation. In Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics (Pollard, D., Torgersen, E. and Yang, G. eds.) 55–87. Springer, New York.
  • Birgé, L. and Massart, P. (2001a). Gaussian model selection. J. Eur. Math. Soc. 3 203–268.
  • Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection. Probab. Theory Related Fields 138 33–73.
  • Candès, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
  • Donoho, D. and Johnstone, I. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
  • Huet, S. (2006). Model selection for estimating the nonzero components of a Gaussian vector. ESAIM Probab. Statist. 10 164–183.
  • Ibragimov, I. A. and Khas’minskii, R. Z. (1981). On the nonparametric density estimates. Zap. Nauchn. Semin. LOMI 108 73–89.
  • Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 1302–1338.
  • Lebarbier, E. (2005). Detecting multiple change-points in the mean of Gaussian process by model selection. Signal Processing 85 717–736.
  • Mallows, C. L. (1973). Some comments on Cp. Technometrics 15 661–675.
  • Mcquarrie, A. D. R. and Tsai, C. L. (1998). Regression and Times Series Model Selection. World Scientific Publishing, River Edge, NJ.
  • Rissanen, J. (1983). A universal prior for integers and estimation by description minimum length. Ann. Statist. 11 416–431.
  • Rissanen, J. (1984). Universal coding, information, prediction and estimation. IEEE Trans. Inform. Theory 30 629–636.
  • Saito, N. (1994). Simultaneous noise suppression and signal compression using a library of orthogonal bases and the minimum description length criterion. In Wavelets in Geophysics (E. Foufoula-Georgiou and P. Kumar, eds.) 299–324. Academic Press, San Diego, CA.
  • Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.