A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics

Xiyang Luo, Wout Bittremieux, Johannes Griss, Eric W. Deutsch, Timo Sachsenberg, Lev I. Levitsky, Mark V. Ivanov, Julia A. Bubis, Ralf Gabriels, Henry Webel, Aniel Sanchez, Mingze Bai, Lukas Käll, Yasset Perez-Riverol

Research output: Contribution to journalArticlepeer-review

Abstract

Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public data sets using two different clustering algorithms (spectra-cluster and MaRaCluster) to evaluate how the consensus spectrum generation procedure influences downstream peptide identification. The BEST and BIN methods were found the most reliable methods for consensus spectrum generation, including for data sets with post-translational modifications (PTM) such as phosphorylation. All source code and data of the present study are freely available on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark.

Original languageEnglish
Pages (from-to)1566-1574
Number of pages9
JournalJournal of Proteome Research
Volume21
Issue number6
DOIs
Publication statusPublished - 2022 Jun 3

Bibliographical note

Funding Information:
The authors would like to acknowledge the EuBIC-MS community that organized the EuBIC-MS Developer Meeting in January 2020, triggering the original discussions and implementations of this work. L.K. was supported by a grant from the Swedish Research Council (Grant 2017-04030).

Publisher Copyright:
© 2022 American Chemical Society. All rights reserved.

Subject classification (UKÄ)

  • Other Natural Sciences not elsewhere specified

Free keywords

  • benchmark
  • big data
  • clustering
  • consensus spectra
  • mass spectrometry
  • pride database
  • ProteomeXchange
  • spectral libraries

Fingerprint

Dive into the research topics of 'A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics'. Together they form a unique fingerprint.

Cite this