TY - JOUR
T1 - Improved modeling of clinical data with kernel methods
AU - Daemen, Anneleen
AU - Timmerman, Dirk
AU - Van den Bosch, Thierry
AU - Bottomley, Cecilia
AU - Kirk, Emma
AU - Van Holsbeke, Caroline
AU - Valentin, Lil
AU - Bourne, Tom
AU - De Moor, Bart
PY - 2012
Y1 - 2012
N2 - Objective: Despite the rise of high-throughput technologies, clinical data such as age, gender and medical history guide clinical management for most diseases and examinations. To improve clinical management, available patient information should be fully exploited. This requires appropriate modeling of relevant parameters. Methods: When kernel methods are used, traditional kernel functions such as the linear kernel are often applied to the set of clinical parameters. These kernel functions, however, have their disadvantages due to the specific characteristics of clinical data, being a mix of variable types with each variable its own range. We propose a new kernel function specifically adapted to the characteristics of clinical data. Results: The clinical kernel function provides a better representation of patients' similarity by equalizing the influence of all variables and taking into account the range r of the variables. Moreover, it is robust with respect to changes in r. Incorporated in a least squares support vector machine, the new kernel function results in significantly improved diagnosis, prognosis and prediction of therapy response. This is illustrated on four clinical data sets within gynecology, with an average increase in test area under the ROC curve (AUC) of 0.023, 0.021, 0.122 and 0.019, respectively. Moreover, when combining clinical parameters and expression data in three case studies on breast cancer, results improved overall with use of the new kernel function and when considering both data types in a weighted fashion, with a larger weight assigned to the clinical parameters. The increase in AUC with respect to a standard kernel function and/or unweighted data combination was maximum 0.127, 0.042 and 0.118 for the three case studies. Conclusion: For clinical data consisting of variables of different types, the proposed kernel function which takes into account the type and range of each variable - has shown to be a better alternative for linear and non-linear classification problems. (C) 2011 Elsevier B.V. All rights reserved.
AB - Objective: Despite the rise of high-throughput technologies, clinical data such as age, gender and medical history guide clinical management for most diseases and examinations. To improve clinical management, available patient information should be fully exploited. This requires appropriate modeling of relevant parameters. Methods: When kernel methods are used, traditional kernel functions such as the linear kernel are often applied to the set of clinical parameters. These kernel functions, however, have their disadvantages due to the specific characteristics of clinical data, being a mix of variable types with each variable its own range. We propose a new kernel function specifically adapted to the characteristics of clinical data. Results: The clinical kernel function provides a better representation of patients' similarity by equalizing the influence of all variables and taking into account the range r of the variables. Moreover, it is robust with respect to changes in r. Incorporated in a least squares support vector machine, the new kernel function results in significantly improved diagnosis, prognosis and prediction of therapy response. This is illustrated on four clinical data sets within gynecology, with an average increase in test area under the ROC curve (AUC) of 0.023, 0.021, 0.122 and 0.019, respectively. Moreover, when combining clinical parameters and expression data in three case studies on breast cancer, results improved overall with use of the new kernel function and when considering both data types in a weighted fashion, with a larger weight assigned to the clinical parameters. The increase in AUC with respect to a standard kernel function and/or unweighted data combination was maximum 0.127, 0.042 and 0.118 for the three case studies. Conclusion: For clinical data consisting of variables of different types, the proposed kernel function which takes into account the type and range of each variable - has shown to be a better alternative for linear and non-linear classification problems. (C) 2011 Elsevier B.V. All rights reserved.
KW - Machine learning
KW - Support vector machine
KW - Kernel function
KW - Biostatistics
KW - Clinical data representation
KW - Clinical decision support
KW - system
KW - Gynecology
KW - Breast cancer
U2 - 10.1016/j.artmed.2011.11.001
DO - 10.1016/j.artmed.2011.11.001
M3 - Article
C2 - 22134094
SN - 1873-2860
VL - 54
SP - 103
EP - 114
JO - Artificial Intelligence in Medicine
JF - Artificial Intelligence in Medicine
IS - 2
ER -