An efficient sampling strategy for selection of biobank samples using risk scores
Research output: Contribution to journal › Article
Aim: The aim of this study was to suggest a new sample-selection strategy based on risk scores in case-control studies with biobank data. Methods: An ongoing Swedish case-control study on fetal exposure to endocrine disruptors and overweight in early childhood was used as the empirical example. Cases were defined as children with a body mass index (BMI) ≥18 kg/m2 (n=545) at four years of age, and controls as children with a BMI of 1/217 kg/m2 (n=4472 available). The risk of being overweight was modelled using logistic regression based on available covariates from the health examination and prior to selecting samples from the biobank. A risk score was estimated for each child and categorised as low (0-5%), medium (6-13%) or high (≥14%) risk of being overweight. Results: The final risk-score model, with smoking during pregnancy (p=0.001), birth weight (p<0.001), BMI of both parents (p<0.001 for both), type of residence (p=0.04) and economic situation (p=0.12), yielded an area under the receiver operating characteristic curve of 67% (n=3945 with complete data). The case group (n=416) had the following risk-score profile: low (12%), medium (46%) and high risk (43%). Twice as many controls were selected from each risk group, with further matching on sex. Computer simulations showed that the proposed selection strategy with stratification on risk scores yielded consistent improvements in statistical precision. Conclusions: Using risk scores based on available survey or register data as a basis for sample selection may improve possibilities to study heterogeneity of exposure effects in biobank-based studies.
|Research areas and keywords||
Subject classification (UKÄ) – MANDATORY
|Number of pages||4|
|Journal||Scandinavian Journal of Public Health|
|Publication status||Published - 2017 Jul 1|