
Search
Publication:
Free Text: Empirical Performance of Cross-Validation With Oracle Methods in a Genomics Context.
abstract
When employing model selection Methods With Oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold Cross-Validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate Empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso.
Related Articles
High-dimensional Cox models: the choice of penalty as part of the model building process.
Biom J. 2010
High-dimensional Cox models: the choice of penalty as part of the model building process.
Benner A, Zucknick M, Hielscher T, Ittrich C, Mansmann U. Biom J. 2010 Feb; 52(1):50-69.
PENALIZED VARIABLE SELECTION PROCEDURE FOR COX MODELS WITH SEMIPARAMETRIC RELATIVE RISK.
Ann Stat. 2010
PENALIZED VARIABLE SELECTION PROCEDURE FOR COX MODELS WITH SEMIPARAMETRIC RELATIVE RISK.
Du P, Ma S, Liang H. Ann Stat. 2010 Aug 1; 38(4):2092-2117.
VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA.
Stat Sin. 2010
VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA.
Garcia RI, Ibrahim JG, Zhu H. Stat Sin. 2010 Jan; 20(1):149-165.
Review Methods of plant breeding in the genome era.
Genet Res (Camb). 2010
Review Methods of plant breeding in the genome era.
Xu S, Hu Z. Genet Res (Camb). 2010 Dec; 92(5-6):423-41.
Review [From population genetics to population genomics of forest trees: integrated population genomics approach].
Genetika. 2006
Review [From population genetics to population genomics of forest trees: integrated population genomics approach].
Krutovskiĭ KV. Genetika. 2006 Oct; 42(10):1304-18.
