Peptides that were identified[1], in collaboration, by Spanish National Cancer Research Centre (CNIO) and Centro Nacional de Investigaciones Cardiovasculares (CNIC).

We interrogate peptides from eight large-scale human proteomics experiments and databases, irrespective of tissue or cell type, for the vast majority of the protein-coding genes in these experiments. The peptide has to be included in at least 2 experiments.


We collected the peptides for the analysis from eight distinct large-scale proteomics data sets: the PeptideAtlas[2] and NIST databases and six published large-scale experiments.[3-8]. For the Wilhelm analysis, we only included the peptides from the publically available Cellzome experiments on human tissues.

The eight studies covered a huge range of tissues and cell types: the peptides from the PeptideAtlas database cover 51 different tissues, cell types, and developmental stages, whereas the Geiger study interrogated 11 different cell types. The spectra from the PeptideAtlas database were only part of the NIST database and the Ezkurdia analyses. The Kim and Wilhelm analyses peptides were generated from 30 and 35 distinct tissues types (51 tissues in total).


The following table describes the annotations of PROTEO tracks:

field example Info type description
Item GVSEAAVLKPSEELPAEATSSVEPEK string Peptide found in 8 large-scale human proteomics experiments and databases
Score 2 int range [2-8] Number experiments where the peptide were found. The peptide has to be included in at least 2 experiments
Position chr21:33043745-33043822 chr:start-end Genome position
List of Ensembl transcript ID ENST00000399804;ENST00000286835 string List of variants (Ensembl transcript ID) that contains the peptide


The data was generated by Spanish National Cancer Research Centre (CNIO), Centro Nacional de Investigaciones Cardiovasculares (CNIC) and Spanish Institute of Bioinformatics using a computational pipeline developed by Michael Tress, Iakes Ezkurdia and Jose Manuel Rodriguez.


If you have questions or comments, please write to:

Data Release Policy

All data is freely available to the public.


[1] Ezkurdia I, Rodriguez JM, Carrillo-de Santa Pau E, Vazquez J, Valencia A, Tress ML.
Most highly expressed protein-coding genes have a single dominant isoform.
J Proteome Res. 2015 Apr 3;14(4):1880-7. doi: 10.1021/pr501286b. Epub 2015 Mar 11. PubMed PMID: 25732134.

[2] Farrah, T.; Deutsch, E. W.; Hoopmann, M. R.; Hallows, J. L.; Sun, Z.; Huang, C. Y.; Moritz, R. L.
The state of the human proteome in 2012 as viewed through PeptideAtlas.
J. Proteome Res. 2013, 12, 162-171

[3] Ezkurdia, I.; del Pozo, A.; Frankish, A.; Rodriguez, J. M.; Harrow, J.; Ashman, K.; Valencia, A.; Tress, M. L.
Comparative proteomics reveals a significant bias toward alternative protein isoforms with conserved structure and function.
Mol. Biol. Evol. 2012, 29, 2265– 2283

[4] Munoz, J.; Low, T. Y.; Kok, Y. J.; Chin, A.; Frese, C. K.; Ding, V.; Choo, A.; Heck, A. J.
The quantitative proteomes of human-induced pluripotent stem cells and embryonic stem cells.
Mol. Syst. Biol. 2011, 7, 550

[5] Nagaraj, N.; Wisniewski, J. R.; Geiger, T.; Cox, J.; Kircher, M.; Kelso, J.; Pääbo, S.; Mann, M.
Deep proteome and transcriptome mapping of a human cancer cell line.
Mol. Syst. Biol. 2011, 7, 548

[6] Geiger, T.; Wehner, A.; Schaab, C.; Cox, J.; Mann, M.
Comparative proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins.
Mol. Cell Proteomics. 2012, 11, M111.014050

[7] Kim, M. S.; Pinto, S. M.; Getnet, D.; Nirujogi, R. S.; Manda, S. S.; Chaerkady, R.; Madugundu, A. K.; Kelkar, D. S.; Isserlin, R.; Jain, S.
A draft map of the human proteome.
Nature 2014, 509, 575-581

[8] Wilhelm, M.; Schlegl, J.; Hahne, H.; Moghaddas Gholami, A.; Lieberenz, M.; Savitski, M. M.; Ziegler, E.; Butzmann, L.; Gessulat, S.; Marx, H.
Mass-spectrometry-based draft of the human proteome.
Nature 2014, 509, 582-587

Note: The bold title corresponds to the main paper that provides the collected peptides.