CCancer & PLIPS databases
CCancer is an automatically collected database of gene lists, which were reported mostly by experimental studies in various biological and clinical contexts. At the moment, the database covers more then 3500 gene lists extracted from 3000 papers published in ~100 peer-reviewed journals. As input, CCancer accepts a gene list. An enrichment analyses is implemented to generate, as output, a highly informative survey over recently published studies that report gene lists, which significantly intersect with the query gene list. A report on gene pairs from the input list which were frequently reported together by other biological studies is also provided. More details can be found in the original publication .
By searching through major proteomics journals, we have collected more then 1000 independent studies published recently, which reported about 1500 different protein lists. On the basis of this data, we developed a computational tool PLIPS (Protein Lists Identified in Proteomics Studies). PLIPS accepts as input a list of protein/gene identifiers. With the use of statistical analyses, PLIPS infers recently published proteomics studies, which report protein lists that significantly intersect with a query list. More details can be found in the original publication .
CCancer&PLIPS database of gene/protein lists is used to associate genes into global network. In total, CCancer&PLIPS database has 5 238 gene/protein lists reported in various functional context by independent studies. For each pair of genes we can count the number of times k12 they were reported together, as well as, to count the number of times each gene is reported alone (k1, k2). We can use standard urn schema to derive significantly associated gene pairs. Let us denote the total number gene/protein lists within CCancer&PLIPS database as N. The value k12 follows a hypergeometric distribution with parameters N, k1 and k2 (k1 balls were drawn without replacement from an urn containing "N " balls in total, k2 of which are white). We also need to adjust the derived p-value for multiple testing (each gene is tested versus all other genes). We used Bonferoni correction for multiple testing. At the significance level (p-value < 0.01) each gene from the CCancer&PLIPS database was associated on average with about 20 other genes.
CCancer spider implements Global Network statistical framework to analyze gene list using as reference knowledge the global gene association network derived from CCancer&PLIPS database. Like other spider tools, CCancer spider highlights genes in the output graphical network model based on annotation of genes by Human Disease Ontology (HDO) terms. Each gene/protein list in the CCancer&PLIPS database are annotated with one or several HDO terms based on whether or not the corresponding paper (text) was overrepresented with the terms. For each gene and HDO term we can count the number of times k12 they were related to the same CCancer&PLIPS gene/protein list, as well as, to count the number of times the gene and the term are present alone (k1, k2). As in the case of gene pairwice associations, we can use standard urn schema to derive significant associatins between gene and HDO term. The value k12 follows a hypergeometric distribution with parameters N, k1 and k2 (k1 balls were drawn without replacement from an urn containing "N " balls in total, k2 of which are white). We also adjust the derived p-value for multiple testing (each gene is tested versus all HDO terms).
Reference: if you will find the results produced by CCancer & PLIPS usefull, please cite both:
1. Dietmann S, Lee W, Wong P,Rodchenkov I, Antonov AV CCancer: a birds eye view on gene lists reported in cancer-related studies. Nucleic Acids Research, 2010, Vol. 38, No. suppl_2 W118-W123.
2. Antonov AV, Dietmann S, Wong P, Rodchenkov I, Mewes HW PLIPS, an Automatically Collected Database of Protein Lists Reported by Proteomics Studies. J. Proteome Res., 2009, 8 (3), pp 1193-1197.