Numerical compression schemes for proteomics mass spectrometry data.

Research output: Contribution to journalArticle

Standard

Numerical compression schemes for proteomics mass spectrometry data. / Teleman, Johan; Dowsey, Andrew W; Gonzalez-Galarza, Faviel F; Perkins, Simon; Pratt, Brian; Rost, Hannes; Malmstrom, Lars; Malmström, Johan; Jones, Andrew R; Deutsch, Eric W; Levander, Fredrik.

In: Molecular & Cellular Proteomics, Vol. 13, No. 6, 2014, p. 1537-1542.

Research output: Contribution to journalArticle

Harvard

Teleman, J, Dowsey, AW, Gonzalez-Galarza, FF, Perkins, S, Pratt, B, Rost, H, Malmstrom, L, Malmström, J, Jones, AR, Deutsch, EW & Levander, F 2014, 'Numerical compression schemes for proteomics mass spectrometry data.', Molecular & Cellular Proteomics, vol. 13, no. 6, pp. 1537-1542. https://doi.org/10.1074/mcp.O114.037879

APA

Teleman, J., Dowsey, A. W., Gonzalez-Galarza, F. F., Perkins, S., Pratt, B., Rost, H., ... Levander, F. (2014). Numerical compression schemes for proteomics mass spectrometry data. Molecular & Cellular Proteomics, 13(6), 1537-1542. https://doi.org/10.1074/mcp.O114.037879

CBE

Teleman J, Dowsey AW, Gonzalez-Galarza FF, Perkins S, Pratt B, Rost H, Malmstrom L, Malmström J, Jones AR, Deutsch EW, Levander F. 2014. Numerical compression schemes for proteomics mass spectrometry data. Molecular & Cellular Proteomics. 13(6):1537-1542. https://doi.org/10.1074/mcp.O114.037879

MLA

Vancouver

Teleman J, Dowsey AW, Gonzalez-Galarza FF, Perkins S, Pratt B, Rost H et al. Numerical compression schemes for proteomics mass spectrometry data. Molecular & Cellular Proteomics. 2014;13(6):1537-1542. https://doi.org/10.1074/mcp.O114.037879

Author

Teleman, Johan ; Dowsey, Andrew W ; Gonzalez-Galarza, Faviel F ; Perkins, Simon ; Pratt, Brian ; Rost, Hannes ; Malmstrom, Lars ; Malmström, Johan ; Jones, Andrew R ; Deutsch, Eric W ; Levander, Fredrik. / Numerical compression schemes for proteomics mass spectrometry data. In: Molecular & Cellular Proteomics. 2014 ; Vol. 13, No. 6. pp. 1537-1542.

RIS

TY - JOUR

T1 - Numerical compression schemes for proteomics mass spectrometry data.

AU - Teleman, Johan

AU - Dowsey, Andrew W

AU - Gonzalez-Galarza, Faviel F

AU - Perkins, Simon

AU - Pratt, Brian

AU - Rost, Hannes

AU - Malmstrom, Lars

AU - Malmström, Johan

AU - Jones, Andrew R

AU - Deutsch, Eric W

AU - Levander, Fredrik

PY - 2014

Y1 - 2014

N2 - The open XML format mzML, used for representation of mass spectrometry (MS) data, is pivotal for the development of platform-independent MS analysis software. Although conversion from vendor formats to mzML must take place on a platform on which the vendor libraries are available (i.e. Windows), once mzML files have been generated, they can be used on any platform. However, the mzML format has turned out to be less efficient than vendor formats. In many cases, the naive mzML representation is 4-fold or even up to 18-fold larger compared to the original vendor file. In disk I/O limited setups, a larger data file also leads to longer processing times, which is a problem given the data production rates of modern mass spectrometers. In an attempt to reduce this problem, we here present a family of numerical compression algorithms called MS-Numpress, intended for efficient compression of MS data. To facilitate ease of adoption, the algorithms target the binary data in the mzML standard, and support in main proteomics tools is already available. Using a test set of 10 representative MS data files we demonstrate typical file size decreases of 90% when combined with traditional compression, as well as read time decreases of up to 50%. It is envisaged that these improvements will be beneficial for data handling within the MS community.

AB - The open XML format mzML, used for representation of mass spectrometry (MS) data, is pivotal for the development of platform-independent MS analysis software. Although conversion from vendor formats to mzML must take place on a platform on which the vendor libraries are available (i.e. Windows), once mzML files have been generated, they can be used on any platform. However, the mzML format has turned out to be less efficient than vendor formats. In many cases, the naive mzML representation is 4-fold or even up to 18-fold larger compared to the original vendor file. In disk I/O limited setups, a larger data file also leads to longer processing times, which is a problem given the data production rates of modern mass spectrometers. In an attempt to reduce this problem, we here present a family of numerical compression algorithms called MS-Numpress, intended for efficient compression of MS data. To facilitate ease of adoption, the algorithms target the binary data in the mzML standard, and support in main proteomics tools is already available. Using a test set of 10 representative MS data files we demonstrate typical file size decreases of 90% when combined with traditional compression, as well as read time decreases of up to 50%. It is envisaged that these improvements will be beneficial for data handling within the MS community.

U2 - 10.1074/mcp.O114.037879

DO - 10.1074/mcp.O114.037879

M3 - Article

VL - 13

SP - 1537

EP - 1542

JO - Molecular and Cellular Proteomics

T2 - Molecular and Cellular Proteomics

JF - Molecular and Cellular Proteomics

SN - 1535-9484

IS - 6

ER -