Computational MS & AI
Mass spectrometry (MS) based proteomics has numerous applications ranging from clinical biomarker discovery to antibody profiling. By measuring the mass-to-charge ratio of peptides and proteins the method facilitates the identification and quantification of proteins in a sample, as well as the analysis of post-translational modifications (PTMs) and de novo protein sequencing. While there are numerous existing software packages for the downstream analysis of MS data, the integration of machine learning techniques in this field is still in its infancy. The goal is to develop machine learning and, in particular, deep learning applications that will streamline the data analysis workflow and provide novel insights that are otherwise missed in logic-driven approaches. To this end, we are focusing our attention on two bottlenecks described below.
One of the first steps in the MS proteomics data-analysis pipeline is the detection of what are known as peptide features. The identification and quantification of proteins relies on the detection of individual peptides primarily in three dimensions: retention time (RT), mass-to-charge ratio (m/z) and signal intensity. Some instruments record an additional fourth dimension, time of flight. Peptides exhibit a specific pattern of signal intensities that reflect the number of isotopes and their charge pattern across the RT and m/z axes. Taken together, the isotope peaks from a single peptide make up a peptide feature. The current state-of-the-art in peptide feature detection relies on logic-based algorithms to separate the features from the noise. While this approach is sufficient in most use cases, it struggles with low intensity, overlapping features and signal perturbation.
Following feature detection, peptide identification relies on an abundance of reference spectral libraries. Peptides whose spectra are known beforehand can be identified in new samples based on a comparison with these same spectra. However, problems arise when peptides are detected for which there is no reference. This is fairly common when working with PTMs or peptides of non-tryptic origin, such as immunopeptides or neuropeptides. In such cases, workflows rely on de novo sequencing techniques or predicted spectral libraries.
Researchers
dr. P. (Pavel) Sinitcyn
Assistant Professordr. M. (Majid) Mohammadi
Researcher
Publications
Fast and Deep Phosphoproteome Analysis with the Orbitrap Astral Mass Spectrometer
N Lancaster*, P Sinitcyn*, P Forny*, T Peters-Clarke, C Fecher, A Smith, E Shishkova, T Arrey, A Pashkova, ML Robinson, N Arp, J Fan, J Hansen, A Galmozzi, L Serrano, J Rojas, A Gasch, M Westphall, H Stewart, C Hock, E Damoc, D Pagliarini, V Zabrouskov, J Coon
Nature Communication, 2024Global detection of human variants and isoforms by deep proteome sequencing
P Sinitcyn*, A Richards*, R Weatheritt, D Brademan, H Marx, E Shishkova, J Meyer, A Hebert, M Westphall, B Blencowe, J Cox, J Coon
Nature Biotechnology, 2023MaxDIA enables library-based and library-free data-independent acquisition proteomics
P Sinitcyn*, H Hamzeiy*, F Salinas*, D Itzhak, F McCarthy, C Wichmann, M Steger, U Ohmayer, U Distler, S Kaspar-Schoenefeld, N Prianichnikov, S¸ Yılmaz, J Rudolph, S Tenzer, Y Perez-Riverol, N Nagaraj, S Humphrey, J Cox
Nature Biotechnology, 2021MaxQuant module for the identification of genomic variants propagated into peptides
P Sinitcyn, M Gerwien, J Cox
Proteomics in Systems Biology: Methods and Protocols, 2022- Computational methods for understanding mass spectrometry–based shotgun proteomics data
P Sinitcyn*, J Rudolph*, and J Cox
Annual Review of Biomedical Data Science, 2018