What is GSEA?

Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).
Reference: http://www.broad.mit.edu/gsea/

Why GSEA?

Traditional statistics use adjusted P-values with some arbitrary cutoff, treating genes with slightly different P-values as different entities. Also, small differences in mRNA abundance are often not detected, nor are large changes in just a few genes.

GSEA remedies this by using all the genes in your expression data for the analysis. GSEA also compiles per-gene statistics across genes within a gene set, allowing for the detection of small changes in many genes or large changes in few genes.

The GSEA algorithm implemented in MeV v4.3 is based on Zhen Jiang and Robert Gentleman's 2007 Bioinformatics paper (Jiang, Z., Gentleman, R., (2007). Bioinformatics. 2007 Feb 1; 23(3):306-13. Extensions to gene set enrichment analysis).

Brief Description of the GSEA Algorithm

The GSEA algorithm can be roughly divided in to three steps:

How to Run GSEA

GSEA uses a set of parameter input dialogs that open sequentially to provide input options that correspond to each step of the process. The first step in the processes is the selection of all the requisite data and supplementary files.

The “Phenotype/Class Assignment” panel as the name suggests lets you assign phenotype/class labels to your samples. On clicking the “Assign” button, a dialog box pops up asking you the number of factors.

For example, if sex is the phenotype (factors) that influences your data the most, enter 1 in the textbox provided.

The next step in assigning phenotypes would be to specify the level of each phenotye/factor.

In the example above, MALE and FEMALE are the two levels of the phenotype “SEX.” So, enter 2 in the textbox provided.

The final step in assigning phenotype labels to your samples would be to assign which sample corresponds to what level

Going by our example, Group1 and Group2 symbolize MALE and FEMALE.

You can save these grouping using the “Save settings” button. To load saved groupings, use the “Load settings” button. Reset button will clear all your choices. Once you are done, hit OK.

This is what you will see once the phenotype assignment is completed.

The “Geneset” panel allows you to load the gene set of your choice. You can load an existing geneset using the Browse button.

You can download genesets from the MIT/Broad website http://www.broad.mit.edu/gsea/msigdb/downloads.jsp

The “Annotation” panel lets you upload annotations. Annotations are a MUST for running GSEA. Details on how to load annotations is described in the MeV Manual.

The next step in running GSEA is to assign parameters. On hitting the Next button, you will see something like this:

The GUI is pretty self explanatory, but here is some better clarification about the available methods for collapsing probes to genes.

Sample Probe Values:
Sample1 Sample2 Sample3 St. Dev
Probe_1 10 20 30 3
Probe_2 20 5 10 4
Probe_3 20 15 10 2

For a probe with missing values, using Maximum Probe, the substituted values for Samples 1-3 would be 20, 20, 30. Using Median Probe, the values would be 20, 10, 15. With Standard Deviation, the Maximum is used, so the values would be 20, 5, 10. NOTE: SD will be calculated by MeV on the fly. You do not have to do anything. This is just an example.


The last step is to hit the "Execute" button.

The output from the algorithm is two sets of p values: “Lower p values” and “Upper p values” for the gene sets. Lower p values are the probability of seeing a test statistic lower than the observed one. Upper p values are the probability of seeing a test statistic higher than the observed one.

This is the result screen that you would see on clicking: Table Viewer -> “Significant Genes”. Another piece of information that MeV would provide you is the list of gene sets which have been excluded from the analysis because they do not meet the minimum genes per gene set criteria. This is determined by adding up the number of genes per gene set which are also present in the expression data.

Useful links

A description of the gene set files, provided by MIT is available at: http://www.broad.mit.edu/gsea/msigdb/index.jsp.