Constructing Prognostic Gene Expression Profiles for Breast Cancer Survival Derick R. Peterson, PhD Associate Professor of Biostatistics & Computational Biology and Oncology Director, Biostatistics Shared Resource for the James P. Wilmot Cancer Center Modern micro-array technologies allow us to simultaneously measure the expressions of a huge number of genes, some of which are likely to be associated with cancer survival. While such gene expressions are unlikely to ever completely replace important clinical covariates, evidence is already beginning to mount that they can provide significant additional predictive information. The difficult task is to search among an enormous number of potential predictors and to correctly identify most of the important ones, without mistakenly identifying too many spurious associations. Many commonly used screening procedures unfortunately over-fit the training data, leading to subsets of selected genes that are unrelated to survival in the target population, despite appearing associated with the outcome in the particular sample of data used for subset selection. And some genes might only be useful when used in concert with certain other genes and/or with clinical covariates, yet most available screening methods are inherently univariate in nature, based only on the marginal associations between each predictor and the outcome. While it is impossible to simultaneously adjust for a huge number of predictors in an unconstrained way, we propose a method that offers a middle ground where some partial adjustments can be made in an adaptive way, regardless of the number of candidate predictors.