Open Access
2016 Designing penalty functions in high dimensional problems: The role of tuning parameters
Ting-Huei Chen, Wei Sun, Jason P. Fine
Electron. J. Statist. 10(2): 2312-2328 (2016). DOI: 10.1214/16-EJS1169

Abstract

Various forms of penalty functions have been developed for regularized estimation and variable selection. Screening approaches are often used to reduce the number of covariate before penalized estimation. However, in certain problems, the number of covariates remains large after screening. For example, in genome-wide association (GWA) studies, the purpose is to identify Single Nucleotide Polymorphisms (SNPs) that are associated with certain traits, and typically there are millions of SNPs and thousands of samples. Because of the strong correlation of nearby SNPs, screening can only reduce the number of SNPs from millions to tens of thousands and the variable selection problem remains very challenging. Several penalty functions have been proposed for such high dimensional data. However, it is unclear which class of penalty functions is the appropriate choice for a particular application. In this paper, we conduct a theoretical analysis to relate the ranges of tuning parameters of various penalty functions with the dimensionality of the problem and the minimum effect size. We exemplify our theoretical results in several penalty functions. The results suggest that a class of penalty functions that bridges $L_{0}$ and $L_{1}$ penalties requires less restrictive conditions on dimensionality and minimum effect sizes in order to attain the two fundamental goals of penalized estimation: to penalize all the noise to be zero and to obtain unbiased estimation of the true signals. The penalties such as SICA and Log belong to this class, but they have not been used often in applications. The simulation and real data analysis using GWAS data suggest the promising applicability of such class of penalties.

Citation

Download Citation

Ting-Huei Chen. Wei Sun. Jason P. Fine. "Designing penalty functions in high dimensional problems: The role of tuning parameters." Electron. J. Statist. 10 (2) 2312 - 2328, 2016. https://doi.org/10.1214/16-EJS1169

Information

Received: 1 April 2014; Published: 2016
First available in Project Euclid: 29 August 2016

zbMATH: 06624518
MathSciNet: MR3541973
Digital Object Identifier: 10.1214/16-EJS1169

Keywords: Folded-concave penalties , Genome-wide association studies , tuning parameter selection

Rights: Copyright © 2016 The Institute of Mathematical Statistics and the Bernoulli Society

Vol.10 • No. 2 • 2016
Back to Top