BN: Bayes Network Analysis

Parameter Information


Location of Support File(s)

This option allows users to select the location where all support files needed to run BN.

Network Seed

If this option is selected the user is expected to provide a file representing network. The file should contain a list of edges, one of each line, and the nodes are separated by a tab. The identifier of the node should be one of the following provided in the drop down identified as ÒSelect Seed UIDÓ. Directionality is assumed in each edge specified such that node_A tab node_B is read as node_A to node_B. Cycles are not allowed.

A network seed can also be built from using the ÒCreate Network SeedÓ button. It allows user to create list of edges selecting nodes from the data directly. This option is limited in feature.

Network seed can be used in one of the three ways:
  1. Using the user network seed alone and bypassing literature based network seeding altogether.
  2. Using the user network seed along with Literature mining seed. In this case conflicts (A to B in user seed vs. B to A in Lit mining) in directions are resolved by giving precedence to used provided network seed.
  3. User provided network is used as a complete network and the network structure is not learned, only the Conditional Probability Tables (CPTs) associated with the network is learned for downstream exploration.

Network Priors Sources

The checkboxes provide the users to select the source of Bayesian prior probablities in constructing a seeded network. Currently Literature Mining and KEGG priors are avaialble. The Protein - Protein Interaction as a source of priors is still under development.

As of now, the KEGG support files are automatically downloaded from TN4 website by the application. The user is prompted for Species information if annotation is not avaialble. All other prior sources must be made avaialble.

Discretize Expression Values

The data mining algorithm requires that the data be discretized into bins before it can be evaluated for network structure learning. It is strongly recomended that user selects the default value of 3, which means the data can exist in 3 states:
  1. Under expressed
  2. Over expressed
  3. Unchanged
The algorithm functions and reports meaningfully if the 3 state rule is followed.

Sample Calssification

The samples or experiments can be classified based on some knowledge that the user might have and the user might want to preserve the classification when learning the newtwork structure. In that case the the user might want a specify a numerical value denoting the number of groups the samples belongs to. The default is 0 or no class difference and performs best for this setting when sample size is small. The same also is true for large samplesize but the user is strongly recomended not to exceed more than 2 or 3 groups.

Note that the user is presented with a Classification Dialog where samples can be assigned to group of users choice. A sample can be assigned as group neutral as well, even when number of groups is set to 0. The sample classification dialog shows up once the user navigates from the main dialog by hitting OK. neutral as well

How to direct Edges for graph

The algorithim uses DFS or Depth First Search to connect nodes in the intial seeded network. For large networks with lots of nodes this can take a while to complete. The GO Term option of directing edges is not yet fully developed.

Bootstraping Parameters

The user has the option of bootstraping the samples to generate random networks. This feature is optional. This panel allows the user provide the number of time random samples will be generated in the 'Number of Iterations' box. The 'Confidence Threshold' box allows to define a confidence level cutt-off . The default is 0.7 means the algorithm will select an edge if it appears in 70% of the bootstrap networks.

Note that if bootstrap is chosen, the user is given a chance to play with diffrent cut-offs after the algorithim runs via a samll dialog box. It creates new networks for each new threshold, that can be viewed in Cytoscape via Gaggle broadcast.


Population and Cluster Selection

The user has to choose a cluster that BN algorithm would use to run the analysis. By default the first cluster is highlighted.

Note, that there is max limit in terms of number of genes that this algorithm can handle. If a cluster is chosen that exceeds the maximum genes limit an error window is displayed ahoing the maximum allowable number of genes. At this point the user can choose a new cluster, if one is already defined and is below the limit. If a cluster of allowable size is not defined, the user needs to cancel out of BN window, create new cluster(s) and then launch the BN Analysis window again.

Running BN Parameters

This tab allows the user to customize some advanced options of the algorithm. Most users would be OK to accept the default settings in this panel. Below is a concise description of each available option:
  1. Search Algorithm - The algorithm to search for best network
  2. Scoring Scheme - The scoring mechanism to choose from top networks
  3. Use Arc Reversal -
  4. Max. Number of Parents - Maximum number of parents each netoerk node may have
  5. Cross Validation Folds(K) - In absence of a training dataset how many cross validation(s) are needed

Using Support Files created for standard arrays

We have pre-created support files needed to run BN or LM analysis for some popular microarray platforms like Affymetrix, Agilent etc. Currently we are providing support files 3 species Human, Mouse & Rat. MeV comes preloaded with the files for 2 array types in the ~/data/BN_files folder.
  1. Afymetrix Human U133 Plus 2 Array
  2. Affymetrix Mouse 430 2 Array