Injury severity prediction of cyclist crashes using random forests and random parameters logit models

Antonella Scarano, Maria Rella Riccardi, Filomena Mauriello, Carmelo D'Agostino, Nicola Pasquino, Alfonso Montella

Research output: Contribution to journalArticlepeer-review


Cycling provides numerous benefits to individuals and to society but the burden of road traffic injuries and fatalities is disproportionately sustained by cyclists. Without awareness of the contributory factors of cyclist death and injury, the capability to implement context-specific and appropriate measures is severely limited. In this paper, we investigated the effects of the characteristics related to the road, the environment, the vehicle involved, the driver, and the cyclist on severity of crashes involving cyclists analysing 72,363 crashes that occurred in Great Britain in the period 2016–2018. Both a machine learning method, as the Random Forest (RF), and an econometric model, as the Random Parameters Logit Model (RPLM), were implemented. Three different RF algorithms were performed, namely the traditional RF, the Weighted Subspace RF, and the Random Survival Forest. The latter demonstrated superior predictive performances both in terms of F-measure and G-mean. The main result of the Random Survival Forest is the variable importance that provides a ranked list of the predictors associated with the fatal and severe cyclist crashes. For fatal classification, 19 variables showed a normalized importance higher than 5% with the second involved vehicle manoeuvring and the gender of the driver of the second vehicle having the greatest predictive ability. For serious injury classification, 13 variables showed a normalized importance higher than 5% with the bike leaving the carriageway having the greatest normalized importance. Furthermore, each path from the root node to the leaf nodes has been retraced the way back generating 361 if-then rules with fatal crash as consequent and 349 if-then rules with serious injury crash as consequent. The RPLM showed significant unobserved heterogeneity in the data finding four normal distributed indicator variables with random parameters: cyclist age ≥ 75 (fatal prediction), cyclist gender male (fatal and serious prediction), and driver aged 55–64 (serious prediction). The model's McFadden Pseudo R2 is equal to 0.21, indicating a very good fit. Furthermore, to understand the magnitude of the effects and the contribution of each variable to injury severity probabilities the pseudo-elasticity was assessed, gaining valuable insights into the relative importance and influence of the variables. The RF and the RPLM resulted complementary in identifying several roadways, environmental, vehicle, driver, and cyclist-related factors associated with higher crash severity. Based on the identified contributory factors, safety countermeasures useful to develop strategies for making bike a safer and more friendly form of transport were recommended.

Original languageEnglish
Article number107275
JournalAccident Analysis and Prevention
Publication statusPublished - 2023 Nov

Bibliographical note

Publisher Copyright:
© 2023 The Authors

Subject classification (UKÄ)

  • Transport Systems and Logistics

Free keywords

  • Active travel
  • Crash contributory factors
  • Cyclist safety
  • Econometric models
  • Machine learning
  • Safety countermeasures


Dive into the research topics of 'Injury severity prediction of cyclist crashes using random forests and random parameters logit models'. Together they form a unique fingerprint.

Cite this