EASE: Annotation Over-representation Analysis

Statistical Parameter Information


Several sections on this page are used to specify reported statistical and result trimming parameters.

Reported Statistic

Fisher's Exact Probability

The Fisher's Exact Probability reports the probability that a biological theme is over-represented in the cluster of interest relative to the representation of that theme in the total gene population. For example, suppose that one has a gene list of 50 genes from a population of 10,000 genes. Now suppose that 10 of the 50 genes were related to pathway "A" but only 13 genes in the total population were associated with pathway "A". This scenario would yield a low probability that the observed number of hits (occurrences of pathway "A") within the small sample could be due to chance alone. This statistic is based on the hypergeometric distribution and has benefits over chi-square in that it is appropriate for finite populations. The reference sited for EASE describes this statistic at length.

EASE Score

The EASE Score reported is essentially a jackknifed Fisher's Exact Probability which is arrived at by calculation of the Fisher's Exact where one occurrence (list hit for a term) has been removed.

Multiplicity Corrections

Several p-value corrections can be applied to help correct for the chance of arriving at a significant result when performing multiple tests.

Bonferroni Correction

This correction simply multiplies the statistic by the number of results generated. This is the most stringent correction of the three options.

Bonferroni Step Down Correction

This modified Bonferroni correction ranks the results by the statistic in ascending order. Each value is multiplied by (n-rank) where n is the number of results. In the case of a tie, where two results have the same probability the rank is kept constant until the next element occurs having a higher probability value. The rank is then adjusted for the number of tied elements where rank was constant.

Sidak Method

This correction uses the following formula where v' is the corrected value and k is the rank of the result in terms of original statistic value. In this case ties in rank are handled as described in the step down Bonferroni correction.
v' = 1-(1-v)k

Resampling Probability Analysis

The resampling option performs a number of analysis iterations in which random gene lists of the original cluster size are selected from the population without replacement. The end result reported for a particular term is the probability of obtaining the determined significance level by chance.

Trim Parameters

The trim parameters can be applied to filter analysis results based on the number of hits or the fraction of genes in the cluster that are represented by an annotation term. Sometimes a term can be found significant but does not represent a large segment of the cluster of interest. These options can be applied to be certain that a minimum number of genes in the cluster fall under that particular annotation class. This feature should be used with caution so that biological themes represented by very few genes are not excluded.