Improved modeling of clinical data with kernel methods

Anneleen Daemen, Dirk Timmerman, Thierry Van den Bosch, Cecilia Bottomley, Emma Kirk, Caroline Van Holsbeke, Lil Valentin, Tom Bourne, Bart De Moor

Research output: Contribution to journalArticlepeer-review

1224 Downloads (Pure)

Abstract

Objective: Despite the rise of high-throughput technologies, clinical data such as age, gender and medical history guide clinical management for most diseases and examinations. To improve clinical management, available patient information should be fully exploited. This requires appropriate modeling of relevant parameters. Methods: When kernel methods are used, traditional kernel functions such as the linear kernel are often applied to the set of clinical parameters. These kernel functions, however, have their disadvantages due to the specific characteristics of clinical data, being a mix of variable types with each variable its own range. We propose a new kernel function specifically adapted to the characteristics of clinical data. Results: The clinical kernel function provides a better representation of patients' similarity by equalizing the influence of all variables and taking into account the range r of the variables. Moreover, it is robust with respect to changes in r. Incorporated in a least squares support vector machine, the new kernel function results in significantly improved diagnosis, prognosis and prediction of therapy response. This is illustrated on four clinical data sets within gynecology, with an average increase in test area under the ROC curve (AUC) of 0.023, 0.021, 0.122 and 0.019, respectively. Moreover, when combining clinical parameters and expression data in three case studies on breast cancer, results improved overall with use of the new kernel function and when considering both data types in a weighted fashion, with a larger weight assigned to the clinical parameters. The increase in AUC with respect to a standard kernel function and/or unweighted data combination was maximum 0.127, 0.042 and 0.118 for the three case studies. Conclusion: For clinical data consisting of variables of different types, the proposed kernel function which takes into account the type and range of each variable - has shown to be a better alternative for linear and non-linear classification problems. (C) 2011 Elsevier B.V. All rights reserved.
Original languageEnglish
Pages (from-to)103-114
JournalArtificial Intelligence in Medicine
Volume54
Issue number2
DOIs
Publication statusPublished - 2012

Subject classification (UKÄ)

  • Obstetrics, Gynecology and Reproductive Medicine

Free keywords

  • Machine learning
  • Support vector machine
  • Kernel function
  • Biostatistics
  • Clinical data representation
  • Clinical decision support
  • system
  • Gynecology
  • Breast cancer

Fingerprint

Dive into the research topics of 'Improved modeling of clinical data with kernel methods'. Together they form a unique fingerprint.

Cite this