A review of data-driven approaches to predict train delays

Research output: Contribution to journalReview articlepeer-review


Accurate train delay prediction is vital for effective railway traffic planning and management as well as for providing satisfactory passenger service quality. Despite significant advances in data-driven train delay predictions, it lacks of a systematic review of studies and unified modelling development framework. The paper reviews existing studies with an explicit focus on synthesizing a structural framework that could guide effective data-driven train delay prediction model development. The framework consists of three stages including design concept, modelling and evaluation. The study synthesize and discusses six important modules of the framework: (1) Problem scope, (2) Model inputs, (3) Data quality, (4) Methodologies, (5) Model outputs, and (6) Evaluation techniques. For each module, the important problems and techniques reported are synthesized and research gaps are discussed. The review found that most studies focus on developing complex methodologies for the next stop delay predictions that have limited applications in practice. All studies validate the model accuracy, but very few consider other model performance aspects which makes it difficult to assess their usfulness in practical deployment. Future studies need a holistic view on defining the train delay prediction problem considering both application requirements and implementation challenges. Also, the modelling studies should place more attention to data quality and comprehensive model evaluations in representation power, explainability and validity.

Original languageEnglish
Article number104027
JournalTransportation Research Part C: Emerging Technologies
Publication statusPublished - 2023 Mar

Subject classification (UKÄ)

  • Transport Systems and Logistics
  • Computer Systems

Free keywords

  • Data-driven prediction
  • Railway operations and information
  • Technical development
  • Train delay prediction


Dive into the research topics of 'A review of data-driven approaches to predict train delays'. Together they form a unique fingerprint.

Cite this