NMF: Non-negative Matrix Factorization

Parameter Information


Brunet, J-P., Tamayo, P., Golub, T.R., and Mesirov, J.P. 2004. Metagenes and molecular pattern discovery using matrix factorization. /Proc. Natl. Acad. Sci. USA/ 101(12):4164–4169.
Devarajan K, 2008 Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology. PLoS Comput Biol 4(7): e1000029. doi:10.1371/journal.pcbi.1000029
Lee DD, Seung SH (2001) Algorithms for nonnegative matrix factorization. Adv Neural Inform Process Syst 13: 556–562.
Non-negative Matrix Factorization, a technique which makes use of an algorithm based on decomposition by parts of an extensive data matrix into a small number of relevant metagenes. NMF’s ability to identify expression patterns and make class discoveries has been shown to able to have greater robustness over popular clustering techniques such as HCL and SOM.
MeV’s NMF uses a multiplicative update algorithm, introduced by Lee and Seung in 2001, to factor a non-negative data matrix into two factor matrices referred to as W and H. Associated with each factorization is a user-specified rank. This represents the columns in W, the rows in H, and the number of clusters to which the samples are to be assigned. Starting with randomly seeded matrices and using an iterative approach with a specified cost measurement we can reach a locally optimal solution for these factor matrices. H and W can then be evaluated as metagenes and metagenes expression patterns, respectively. Using a “winner-take-all” approach, samples can be assigned to clusters based on their highest metagenes expression. Multiple iterations of this process allow us to see the robustness of the cluster memberships. Additionally, running multiple ranks consecutively can allow for the comparison between differing numbers of classes using cophenetic correlation.
NMF is most frequently used to make class discoveries through identification of molecular patterns. The module can also be used to cluster genes, generating metasamples rather than metagenes.

Parameters:

Sample Selection

The sample selection option indicates whether to cluster genes or samples.

Run multiple ranks

This checkbox determines if NMF will be performed on one rank or compared across multiple ranks. Selecting this option will allow the user to choose a rank range.

Number of runs

Specifies the number of factorization runs to perform on each rank. Typically, this value is between 20 and 100.

Rank Value/ Rank Range

The value or range of ranks for which NMF is performed.

Maximum iterations

The maximum number of iterations to be completed as W and H approach a local optimization.

Always perform maximum iterations

Specifies whether or not MeV should complete the maximum number of iterations or stop after a certain convergence has been reached.

Cost convergence cutoff

The point at which a run can be halted based on sufficient convergence.

Check Frequency

The frequency, in iterations, of checking for convergence.

Update rules and cost measurement

The algorithmic technique, as described by Lee and Seung, for iteratively updating the factor matrices and the manner in which their cost is measured. The default is “Divergence”.

Data matrix pre-processing

A requirement for running NMF is a data matrix free of negative values. If your data includes negative values, a method must be selected to adjust the data. If “Always adjust data” is checked, the selected operation will be performed, regardless of your data. If unchecked, the selected operation will only be performed if a search through the data reveals negative values. After running your analysis, the preprocessing step (if any) will be recorded in the General Info tab.
Subtract minimum value means that the lowest value in the data, regardless of negativity, will be subtracted from all values, ensuring a non-negative matrix.
Exponentially scale means that every value will be exponentiated, base 2.

Random Number Generation

For the purpose of reproducibility, NMF allows the addition of a seed value for the creation of initial generation of W and H matrices. Use of the same seed value on runs of NMF with identical parameters will result in identical results.

Store results as clusters

Selecting this option will cause the creation and storage of the results as clusters in MeV’s clustering system. These clusters will be visible in the Cluster Manager as well as in MeV’s experiment viewers.


For more information please see the MeV manual.