Big Data Approaches to Phenotyping Acute Ischemic Stroke Using Automated Lesion Segmentation of Multi-Center Magnetic Resonance Imaging Data

Research output: Contribution to journalArticle


Background and Purpose- We evaluated deep learning algorithms' segmentation of acute ischemic lesions on heterogeneous multi-center clinical diffusion-weighted magnetic resonance imaging (MRI) data sets and explored the potential role of this tool for phenotyping acute ischemic stroke. Methods- Ischemic stroke data sets from the MRI-GENIE (MRI-Genetics Interface Exploration) repository consisting of 12 international genetic research centers were retrospectively analyzed using an automated deep learning segmentation algorithm consisting of an ensemble of 3-dimensional convolutional neural networks. Three ensembles were trained using data from the following: (1) 267 patients from an independent single-center cohort, (2) 267 patients from MRI-GENIE, and (3) mixture of (1) and (2). The algorithms' performances were compared against manual outlines from a separate 383 patient subset from MRI-GENIE. Univariable and multivariable logistic regression with respect to demographics, stroke subtypes, and vascular risk factors were performed to identify phenotypes associated with large acute diffusion-weighted MRI volumes and greater stroke severity in 2770 MRI-GENIE patients. Stroke topography was investigated. Results- The ensemble consisting of a mixture of MRI-GENIE and single-center convolutional neural networks performed best. Subset analysis comparing automated and manual lesion volumes in 383 patients found excellent correlation (ρ=0.92; P<0.0001). Median (interquartile range) diffusion-weighted MRI lesion volumes from 2770 patients were 3.7 cm3 (0.9-16.6 cm3). Patients with small artery occlusion stroke subtype had smaller lesion volumes ( P<0.0001) and different topography compared with other stroke subtypes. Conclusions- Automated accurate clinical diffusion-weighted MRI lesion segmentation using deep learning algorithms trained with multi-center and diverse data is feasible. Both lesion volume and topography can provide insight into stroke subtypes with sufficient sample size from big heterogeneous multi-center clinical imaging phenotype data sets.


  • MRI-GENIE and GISCOME Investigators
  • Ona Wu
  • Stefan Winzeck
  • Anne-Katrin Giese
  • Brandon L Hancock
  • Mark R Etherton
  • Mark J R J Bouts
  • Kathleen Donahue
  • Markus D Schirmer
  • Robert E Irie
  • Steven J T Mocking
  • Elissa C McIntosh
  • Raquel Bezerra
  • Konstantinos Kamnitsas
  • Petrea Frid
  • Johan Wasselius
  • John W Cole
  • Huichun Xu
  • Lukas Holmegaard
  • Jordi Jiménez-Conde
  • Robin Lemmens
  • Eric Lorentzen
  • Patrick F McArdle
  • James F Meschia
  • Jaume Roquer
  • Tatjana Rundek
  • Ralph L Sacco
  • Reinhold Schmidt
  • Pankaj Sharma
  • Agnieszka Slowik
  • Tara M Stanne
  • Vincent Thijs
  • Achala Vagal
  • Daniel Woo
  • Stephen Bevan
  • Steven J Kittner
  • Braxton D Mitchell
  • Jonathan Rosand
  • Bradford B Worrall
  • Christina Jern
  • Arne G Lindgren
  • Jane Maguire
  • Natalia S Rost
External organisations
  • University of Cambridge
  • Skåne University Hospital
  • University Hospitals Leuven
  • Sahlgrenska Academy
  • University of Miami
  • Ashford And St Peter's Hospital
  • Jagellonian University
  • University of Lincoln
  • University of Technology Sydney
  • Massachusetts General Hospital
  • Imperial College London
  • University of Maryland, Baltimore
  • Autonomous University of Barcelona
  • Mayo Clinic Florida
  • Medical University of Graz
  • Royal Holloway University of London
  • University of Cincinnati
  • Baltimore Veterans Administration Medical Center
  • University of Virginia
  • University of Gothenburg
Research areas and keywords

Subject classification (UKÄ) – MANDATORY

  • Neurosciences
Original languageEnglish
Pages (from-to)1734-1741
Issue number7
Publication statusE-pub ahead of print - 2019 Jun 10
Publication categoryResearch