{PROTEO} - Peptide evidence for the human genome

Description

Peptides that were identified[1], in collaboration, by Spanish National Cancer Research Centre (CNIO) and Centro Nacional de Investigaciones Cardiovasculares (CNIC).

We interrogate peptides from eight large-scale human proteomics experiments and databases, irrespective of tissue or cell type, for the vast majority of the protein-coding genes in these experiments. The peptide has to be included in at least 2 experiments.

Procedures

We collected the peptides for the analysis from eight distinct large-scale proteomics data sets: the PeptideAtlas[2] and NIST databases and six published large-scale experiments.[3-8]. For the Wilhelm analysis, we only included the peptides from the publically available Cellzome experiments on human tissues.

The eight studies covered a huge range of tissues and cell types: the peptides from the PeptideAtlas database cover 51 different tissues, cell types, and developmental stages, whereas the Geiger study interrogated 11 different cell types. The spectra from the PeptideAtlas database were only part of the NIST database and the Ezkurdia analyses. The Kim and Wilhelm analyses peptides were generated from 30 and 35 distinct tissues types (51 tissues in total).

Annotations

The following table describes the annotations of PROTEO tracks:

field	example	Info type	description
`Item`	GVSEAAVLKPSEELPAEATSSVEPEK	`string`	Peptide found in large-scale human proteomics experiments and databases
`Score`	2	`int range [2-8]`	Number of experiments where the peptide was found. The peptide has to be included in at least 2 experiments
`Position`	chr21:33043745-33043822	`chr:start-end`	Genome position
`List of Ensembl transcript ID`	ENST00000399804;ENST00000286835	`string`	List of variants (designated by Ensembl transcript ID) that contain the peptide

Credits

The data was generated by Spanish National Cancer Research Centre (CNIO), Centro Nacional de Investigaciones Cardiovasculares (CNIC) and Spanish Institute of Bioinformatics using a computational pipeline developed by Michael Tress, Iakes Ezkurdia and Jose Manuel Rodriguez.

Contacts

If you have questions or comments, please contact us.

Data Release Policy

All data is freely available to the public.

References

[1] Ezkurdia I, Rodriguez JM, Carrillo-de Santa Pau E, Vazquez J, Valencia A, Tress ML.
Most highly expressed protein-coding genes have a single dominant isoform.
J Proteome Res. 2015 Apr 3;14(4):1880-7. doi: 10.1021/pr501286b. Epub 2015 Mar 11. PubMed PMID: 25732134.

[2] Farrah, T.; Deutsch, E. W.; Hoopmann, M. R.; Hallows, J. L.; Sun, Z.; Huang, C. Y.; Moritz, R. L.
The state of the human proteome in 2012 as viewed through PeptideAtlas.
J. Proteome Res. 2013, 12, 162-171

[2] Farrah, T.; Deutsch, E.W.; Hoopmann, M.R.; Hallows, J.L.; Sun, Z.; Huang, C.Y.; Moritz, R.L.
The state of the human proteome in 2012 as viewed through PeptideAtlas.
J Proteome Res. 2013, 12, 162-171

[3] Ezkurdia, I.; del Pozo, A.; Frankish, A.; Rodriguez, J.M.; Harrow, J.; Ashman, K.; Valencia, A.; Tress, M.L.
Comparative proteomics reveals a significant bias toward alternative protein isoforms with conserved structure and function.
Mol. Biol. Evol. 2012, 29, 2265– 2283

[4] Munoz, J.; Low, T.Y.; Kok, Y.J.; Chin, A.; Frese, C.K.; Ding, V.; Choo, A.; Heck, A.J.
The quantitative proteomes of human-induced pluripotent stem cells and embryonic stem cells.
Mol. Syst. Biol. 2011, 7, 550

[5] Nagaraj, N.; Wisniewski, J.R.; Geiger, T.; Cox, J.; Kircher, M.; Kelso, J.; Pääbo, S.; Mann, M.
Deep proteome and transcriptome mapping of a human cancer cell line.
Mol. Syst. Biol. 2011, 7, 548

[6] Geiger, T.; Wehner, A.; Schaab, C.; Cox, J.; Mann, M.
Comparative proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins.
Mol. Cell Proteomics 2012, 11, M111.014050

[7] Wilhelm, M.; Schlegl, J.; Hahne, H.; Gholami, A.M.; Lieberenz, M.; Savitski, M.M.; … Kuster B.
Mass-spectrometry-based draft of the human proteome.
Nature 2014, 509, 582-587

[8] Kim, M.S.; Pinto, S.M.; Getnet, D.; Nirujogi, R.S.; Manda, S.S.; Chaerkady, R.; … Pandey, A.
A draft map of the human proteome.
Nature 2014, 509, 575-581

[9] Wang, D.; Eraslan, B.; Wieland, T.; Hallström, B.; Hopf, T.; Zolg, D.P., … Kuster, B.
A deep proteome and transcriptome abundance atlas of 29 healthy human tissues.
Mol. Syst. Biol. 2019, 15, e8503

[10] Zhang B.; Wang J.; Wang X.; Zhu J.; Liu Q.; Shi Z.; … NCI CPTAC.
Proteogenomic characterization of human colon and rectal cancer.
Nature 2014, 513, 382-7

[11] Bekker-Jensen, D.B.; Kelstrup, C.D.; Batth, T.S.; Larsen, S.C.; Haldrup, C.; Bramsen, J.B.; … Olsen J.V.
An optimized shotgun strategy for the rapid generation of comprehensive human proteomes.
Cell Syst. 2017, 4, 587-599.e4

[12] Carlyle, B.C.; Kitchen, R.R.; Kanyo, J.E.; Voss, E.Z.; Pletikos, M.; Sousa, A.M.M.; … Nairn, A.C.
A multiregional proteomic survey of the postnatal human brain.
Nat. Neurosci. 2017, 20, 1787-1795

[13] Schiza, C.; Korbakis, D.; Jarvi, K.; Diamandis, E.P.; Drabovich, A.P.
Identification of TEX101-associated proteins through proteomic measurement of human spermatozoa homozygous for the missense variant rs35033974.
Mol. Cell Proteomics 2019, 18, 338-351

[14] Jiang, L.; Wang, M.; Lin, S.; Jian, R.; Li, X.; Chan, J.; … Snyder M.P.
A quantitative proteome map of the human body.
Cell 2020, 183, 269-283.e19