TY - JOUR
T1 - Adaptive Bayesian SLOPE
T2 - Model Selection With Incomplete Data
AU - Jiang, Wei
AU - Bogdan, Małgorzata
AU - Josse, Julie
AU - Majewski, Szymon
AU - Miasojedow, Błażej
AU - Ročková, Veronika
AU - TraumaBase® Group
N1 - Publisher Copyright:
© 2021 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.
PY - 2022
Y1 - 2022
N2 - We consider the problem of variable selection in high-dimensional settings with missing observations among the covariates. To address this relatively understudied problem, we propose a new synergistic procedure—adaptive Bayesian SLOPE with missing values—which effectively combines SLOPE (sorted l 1 regularization) with the spike-and-slab LASSO (SSL) and is accompanied by an efficient stochastic approximation of expected maximization (SAEM) algorithm to handle missing data. Similarly as in SSL, the regression coefficients are regarded as arising from a hierarchical model consisting of two groups: the spike for the inactive and the slab for the active. However, instead of assigning independent spike and slab Laplace priors for each covariate, here we deploy a joint SLOPE “spike-and-slab” prior which takes into account the ordering of coefficient magnitudes in order to control for false discoveries. We position our approach within a Bayesian framework which allows for simultaneous variable selection and parameter estimation while handling missing data. Through extensive simulations, we demonstrate satisfactory performance in terms of power, false discovery rate (FDR) and estimation bias under a wide range of scenarios including complete data and existence of missingness. Finally, we analyze a real dataset consisting of patients from Paris hospitals who underwent severe trauma, where we show competitive performance in predicting platelet levels. Our methodology has been implemented in C++ and wrapped into open source R programs for public use. Supplemental files for this article are available online.
AB - We consider the problem of variable selection in high-dimensional settings with missing observations among the covariates. To address this relatively understudied problem, we propose a new synergistic procedure—adaptive Bayesian SLOPE with missing values—which effectively combines SLOPE (sorted l 1 regularization) with the spike-and-slab LASSO (SSL) and is accompanied by an efficient stochastic approximation of expected maximization (SAEM) algorithm to handle missing data. Similarly as in SSL, the regression coefficients are regarded as arising from a hierarchical model consisting of two groups: the spike for the inactive and the slab for the active. However, instead of assigning independent spike and slab Laplace priors for each covariate, here we deploy a joint SLOPE “spike-and-slab” prior which takes into account the ordering of coefficient magnitudes in order to control for false discoveries. We position our approach within a Bayesian framework which allows for simultaneous variable selection and parameter estimation while handling missing data. Through extensive simulations, we demonstrate satisfactory performance in terms of power, false discovery rate (FDR) and estimation bias under a wide range of scenarios including complete data and existence of missingness. Finally, we analyze a real dataset consisting of patients from Paris hospitals who underwent severe trauma, where we show competitive performance in predicting platelet levels. Our methodology has been implemented in C++ and wrapped into open source R programs for public use. Supplemental files for this article are available online.
KW - FDR control
KW - Health data
KW - Incomplete data
KW - Penalized regression
KW - Spike and slab prior
KW - Stochastic approximation EM
UR - http://www.scopus.com/inward/record.url?scp=85117475479&partnerID=8YFLogxK
U2 - 10.1080/10618600.2021.1963263
DO - 10.1080/10618600.2021.1963263
M3 - Article
AN - SCOPUS:85117475479
VL - 31
SP - 113
EP - 137
JO - Journal of Computational and Graphical Statistics
JF - Journal of Computational and Graphical Statistics
SN - 1537-2715
IS - 1
ER -