Description

<<<<<<< HEAD

Peptides that were identified[1], in collaboration, by Spanish National Cancer Research Centre (CNIO) and Centro Nacional de Investigaciones Cardiovasculares (CNIC).

We interrogate peptides from eight large-scale human proteomics experiments and databases, irrespective of tissue or cell type, for the vast majority of the protein-coding genes in these experiments. The peptide has to be included in at least 2 experiments.

Procedures

We collected the peptides for the analysis from eight distinct large-scale proteomics data sets: the PeptideAtlas[2] and NIST databases and six published large-scale experiments.[3-8]. For the Wilhelm analysis, we only included the peptides from the publically available Cellzome experiments on human tissues.

The eight studies covered a huge range of tissues and cell types: the peptides from the PeptideAtlas database cover 51 different tissues, cell types, and developmental stages, whereas the Geiger study interrogated 11 different cell types. The spectra from the PeptideAtlas database were only part of the NIST database and the Ezkurdia analyses. The Kim and Wilhelm analyses peptides were generated from 30 and 35 distinct tissues types (51 tissues in total). =======

Peptides that were identified in the GENCODE gene set, either as part of a collaboration between the Spanish National Cancer Research Centre (CNIO) and Centro Nacional de Investigaciones Cardiovasculares (CNIC)[1], or by reanalysis of other proteomics studies[2-14].

Procedures

This proteomic evidence has been collected from various mass spectrometry (MS) sources, covering a range of tissues and cell types.

Prior to GENCODE v33, peptides were collected from 8 distinct large-scale proteomics data sets, incorporating the PeptideAtlas and NIST databases, as well as six published large-scale experiments[1-8]. To improve reliability, peptides from each of these sources were filtered, eliminating non-tryptic and semi-tryptic peptides and peptides containing missed cleavages, and where possible only considering peptides identified by multiple search engines. Note that for the Wilhelm analysis[7], we only included the peptides from the publically available Cellzome experiments on human tissues.

APPRIS releases from GENCODE v33 onwards have used peptide evidence from 2 proteomics studies[8-9], and from GENCODE v38, these were supplemented by peptides from a further 5 studies[10-14]. For these studies, the Comet search engine (Eng et al. 2012) was used with default parameters, then post-processed with Percolator (Käll et al. 2007; The et al. 2016). Peptide-spectrum matches (PSMs) that had a Posterior Error Probability (PEP) of lower than 0.001 were allowed as long as they were fully tryptic peptides and had no more than 2 missed cleavages. Only peptides with at least 2 valid PSMs across the 7 studies are considered by PROTEO. >>>>>>> 5522e53bbb703096e18e84d7210b41465648a536

Annotations

The following table describes the annotations of PROTEO tracks:

field example Info type description
Item GVSEAAVLKPSEELPAEATSSVEPEK string Peptide found in large-scale human proteomics experiments and databases
Score 2 int range [2-8] Number of experiments where the peptide was found. The peptide has to be included in at least 2 experiments
Position chr21:33043745-33043822 chr:start-end Genome position
List of Ensembl transcript ID ENST00000399804;ENST00000286835 string List of variants (designated by Ensembl transcript ID) that contain the peptide

Credits

<<<<<<< HEAD

The data was generated by Spanish National Cancer Research Centre (CNIO), Centro Nacional de Investigaciones Cardiovasculares (CNIC) and Spanish Institute of Bioinformatics using a computational pipeline developed by Michael Tress, Iakes Ezkurdia and Jose Manuel Rodriguez.

=======

PROTEO data has been generated by the Spanish National Cancer Research Centre (CNIO) in collaboration with Centro Nacional de Investigaciones Cardiovasculares (CNIC) and the Spanish Institute of Bioinformatics, using a computational pipeline developed by Michael Tress, Iakes Ezkurdia and Jose Manuel Rodriguez. More recently, this computational pipeline was augmented by an additional component — the protter workflow — developed by Tomás Di Domenico and Thomas Walsh.

>>>>>>> 5522e53bbb703096e18e84d7210b41465648a536

Contacts

If you have questions or comments, please contact us.

Data Release Policy

All data is freely available to the public.

References

<<<<<<< HEAD

[1] Ezkurdia I, Rodriguez JM, Carrillo-de Santa Pau E, Vazquez J, Valencia A, Tress ML.
Most highly expressed protein-coding genes have a single dominant isoform.
J Proteome Res. 2015 Apr 3;14(4):1880-7. doi: 10.1021/pr501286b. Epub 2015 Mar 11. PubMed PMID: 25732134.

[2] Farrah, T.; Deutsch, E. W.; Hoopmann, M. R.; Hallows, J. L.; Sun, Z.; Huang, C. Y.; Moritz, R. L.
The state of the human proteome in 2012 as viewed through PeptideAtlas.
J. Proteome Res. 2013, 12, 162-171 =======

[1] Ezkurdia, I.; Rodriguez, J.M.; Carrillo-de Santa Pau, E.; Vazquez, J.; Valencia, A.; Tress, M.L.
Most highly expressed protein-coding genes have a single dominant isoform.
J Proteome Res 2015, 14, 1880-7 doi:10.1021/pr501286b >>>>>>> 5522e53bbb703096e18e84d7210b41465648a536

[2] Farrah, T.; Deutsch, E.W.; Hoopmann, M.R.; Hallows, J.L.; Sun, Z.; Huang, C.Y.; Moritz, R.L.
The state of the human proteome in 2012 as viewed through PeptideAtlas.
J Proteome Res. 2013, 12, 162-171

[3] Ezkurdia, I.; del Pozo, A.; Frankish, A.; Rodriguez, J.M.; Harrow, J.; Ashman, K.; Valencia, A.; Tress, M.L.
Comparative proteomics reveals a significant bias toward alternative protein isoforms with conserved structure and function.
Mol. Biol. Evol. 2012, 29, 2265– 2283

[4] Munoz, J.; Low, T.Y.; Kok, Y.J.; Chin, A.; Frese, C.K.; Ding, V.; Choo, A.; Heck, A.J.
The quantitative proteomes of human-induced pluripotent stem cells and embryonic stem cells.
Mol. Syst. Biol. 2011, 7, 550

[5] Nagaraj, N.; Wisniewski, J.R.; Geiger, T.; Cox, J.; Kircher, M.; Kelso, J.; Pääbo, S.; Mann, M.
Deep proteome and transcriptome mapping of a human cancer cell line.
Mol. Syst. Biol. 2011, 7, 548

[6] Geiger, T.; Wehner, A.; Schaab, C.; Cox, J.; Mann, M.
Comparative proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins.
Mol. Cell Proteomics 2012, 11, M111.014050

[7] Wilhelm, M.; Schlegl, J.; Hahne, H.; Gholami, A.M.; Lieberenz, M.; Savitski, M.M.; … Kuster B.
Mass-spectrometry-based draft of the human proteome.
Nature 2014, 509, 582-587

[8] Kim, M.S.; Pinto, S.M.; Getnet, D.; Nirujogi, R.S.; Manda, S.S.; Chaerkady, R.; … Pandey, A.
A draft map of the human proteome.
Nature 2014, 509, 575-581

[9] Wang, D.; Eraslan, B.; Wieland, T.; Hallström, B.; Hopf, T.; Zolg, D.P., … Kuster, B.
A deep proteome and transcriptome abundance atlas of 29 healthy human tissues.
Mol. Syst. Biol. 2019, 15, e8503

[10] Zhang B.; Wang J.; Wang X.; Zhu J.; Liu Q.; Shi Z.; … NCI CPTAC.
Proteogenomic characterization of human colon and rectal cancer.
Nature 2014, 513, 382-7

[11] Bekker-Jensen, D.B.; Kelstrup, C.D.; Batth, T.S.; Larsen, S.C.; Haldrup, C.; Bramsen, J.B.; … Olsen J.V.
An optimized shotgun strategy for the rapid generation of comprehensive human proteomes.
Cell Syst. 2017, 4, 587-599.e4

[12] Carlyle, B.C.; Kitchen, R.R.; Kanyo, J.E.; Voss, E.Z.; Pletikos, M.; Sousa, A.M.M.; … Nairn, A.C.
A multiregional proteomic survey of the postnatal human brain.
Nat. Neurosci. 2017, 20, 1787-1795

[13] Schiza, C.; Korbakis, D.; Jarvi, K.; Diamandis, E.P.; Drabovich, A.P.
Identification of TEX101-associated proteins through proteomic measurement of human spermatozoa homozygous for the missense variant rs35033974.
Mol. Cell Proteomics 2019, 18, 338-351

[14] Jiang, L.; Wang, M.; Lin, S.; Jian, R.; Li, X.; Chan, J.; … Snyder M.P.
A quantitative proteome map of the human body.
Cell 2020, 183, 269-283.e19