EASE: Annotation Over-representation Analysis
Statistical Parameter Information
Several sections on this page are used to specify reported statistical and result trimming parameters.
Reported Statistic
Fisher's Exact Probability
The Fisher's Exact Probability reports the probability that a biological theme is
over-represented in the cluster of interest relative to the representation of that theme in the
total gene population. For example, suppose that one has a gene
list of 50 genes from a population of 10,000 genes. Now suppose that 10 of the 50 genes were related to
pathway "A" but only 13 genes in the total population were associated with pathway "A". This scenario
would yield a low probability that the observed number of hits (occurrences of pathway "A") within the small
sample could be due to chance alone. This statistic is based on the hypergeometric distribution and has
benefits over chi-square in that it is appropriate for finite populations. The reference sited for EASE
describes this statistic at length.
EASE Score
The EASE Score reported is essentially a jackknifed Fisher's Exact Probability which is arrived at
by calculation of the Fisher's Exact where one occurrence (list hit for a term) has been removed.
Multiplicity Corrections
Several p-value corrections can be applied to help correct for the chance of arriving at a significant
result when performing multiple tests.
Bonferroni Correction
This correction simply multiplies the statistic by the number of results generated. This is the most
stringent correction of the three options.
Bonferroni Step Down Correction
This modified Bonferroni correction ranks the results by the statistic in ascending order. Each
value is multiplied by (n-rank) where n is the number of results. In the case of a tie, where two
results have the same probability the rank is kept constant until the next element occurs having
a higher probability value. The rank is then adjusted for the number of tied elements where rank was constant.
Sidak Method
This correction uses the following formula where v' is the corrected value and k is the rank of the result
in terms of original statistic value. In this case ties in rank are handled as described in the step down Bonferroni correction.
v' = 1-(1-v)k
Resampling Probability Analysis
The resampling option performs a number of analysis iterations in which random
gene lists of the original cluster size are selected from the population without replacement.
The end result reported for a particular term is the probability of obtaining the determined
significance level by chance.
Trim Parameters
The trim parameters can be applied to filter analysis results based on the number of hits
or the fraction of genes in the cluster that are represented by an annotation term. Sometimes
a term can be found significant but does not represent a large segment of the cluster of interest.
These options can be applied to be certain that a minimum number of genes in the cluster fall under
that particular annotation class. This feature should be used with caution so that biological
themes represented by very few genes are not excluded.