What is an enrichment score? On such large panels, every gene is differentially expressed. To identify genes that are tissue specific an enrichment scores is used to benchmark expression levels in one tissue compared to all other tissues. Genes that are widely expressed score low and around 0. However, genes that are specific score closer to 1. The enrichment score allows comparison between genes and therefore can be used to rank genes in each tissue.
How is the enrichment score calculated?
The enrichment score is calculated using the LIMMA module in bioconductor. Given a large body atlas size data set, each tissue is compared pairwise to each of the other tissues. The LIMMA module is used to calculate a linear model coefficient for each pairwise comparison. That coefficient is a measure of difference between two groups. Significant coefficient (p<0.05). The enrichment score is the sum of all pairwise comparisons for each gene. The figure below illustrates the method. In the figure we present the calculation of enrichment scores for tissue G1.
What are the small numbers near the bar?
When in gene view, the numbers on the right indicate the rank for that gene in the tissue. When in tissue view, the number indicates the rank of that tissue for the gene. Consider the example below for GPR171. GPR171 is ranked 7th in Th1 (on the left). Th1 is the 6th ranking tissue for GPR171 (on the right). Thus, when viewing the Th1 plot, you can tell that although GPR171 is ranked 7th, its not that specific to Th1 because Th1 is ranked 6th for that gene. That information saves time. You would typically be looking for something like IFNG, GZMB, LTA and others that both score high in Th1 and Th1 is their top scoring tissue.
Why is GAPDH not enriched? Its a highly expressing gene.
The enrichment score should not be confused with expression levels. The score is biased towards genes that are tissue specific. House keeping genes, such as GAPDH, are highly epxressed in all tissues, thus are not enriched in any tissue.
How can I use this system to discover new genes relevant to my research?
If you are intersted in a specific gene, use the gene view to locate tissues where it is enriched. High enrichment is a subset of tissues suggests the gene has a function in those tissues. Use the "view coregulated genes" menu to identify other genes with similar profiles. The function of those genes suggests a releated function for your gene of interest.
If you are interest in a tissue, examine the top ranking genes in that tissue. Scan the top few genes and try to look for genes that have a profile highly enriched for your tissue of interest. For instance, FOXP3, which ranks 25th in Tregs and has a very clear specificity to Tregs. In tissue view, use the menu at the top right to select a gene set. Look for the main TFs, kinases and enzymes in your tissue of interest and look for a tissue specific profile for the top scoring genes.
Finally, use our STORM prediction tool to find binding sites that may regulate genes of interst to you. The combination of an enrichment profile and a relevant binding site, strongly suggests function.