BN: Bayes Network Analysis
Parameter Information
Location of Support File(s)
This option allows users to select the location where all support files needed to run BN.
Network Seed
If this option is selected the user is expected to provide a file representing network. The file should contain a list of edges, one of each line, and the nodes are separated by a tab. The identifier of the node should be one of the following provided in the drop down identified as ÒSelect Seed UIDÓ. Directionality is assumed in each edge specified such that node_A tab node_B is read as node_A to node_B. Cycles are not allowed.
A network seed can also be built from using the ÒCreate Network SeedÓ button. It allows user to create list of edges selecting nodes from the data directly. This option is limited in feature.
Network seed can be used in one of the three ways:
- Using the user network seed alone and bypassing literature based network seeding altogether.
- Using the user network seed along with Literature mining seed. In this case conflicts (A to B in user seed vs. B to A in Lit mining) in directions are resolved by giving precedence to used provided network seed.
- User provided network is used as a complete network and the network structure is not learned, only the Conditional Probability Tables (CPTs) associated with the network is learned for downstream exploration.
Network Priors Sources
The checkboxes provide the users to select the source of Bayesian prior probablities in constructing a seeded network.
Currently Literature Mining and KEGG priors are avaialble. The Protein - Protein Interaction as a source of priors is still under development.
As of now, the KEGG support files are automatically downloaded from TN4 website by the application. The user is prompted for Species information
if annotation is not avaialble. All other prior sources must be made avaialble.
Discretize Expression Values
The data mining algorithm requires that the data be discretized into bins before it can be evaluated for network structure learning.
It is strongly recomended that user selects the default value of 3, which means the data can exist in 3 states:
- Under expressed
- Over expressed
- Unchanged
The algorithm functions and reports meaningfully if the 3 state rule is followed.
Sample Calssification
The samples or experiments can be classified based on some knowledge that the user might have and the user might want to preserve the classification
when learning the newtwork structure. In that case the the user might want a specify a numerical value denoting the number of groups the samples belongs
to. The default is 0 or no class difference and performs best for this setting when sample size is small. The same also is true for large samplesize
but the user is strongly recomended not to exceed more than 2 or 3 groups.
Note that the user is presented with a Classification Dialog where samples can be assigned to group of users choice.
A sample can be assigned as group neutral as well, even when number of groups is set to 0. The sample classification dialog shows up
once the user navigates from the main dialog by hitting OK.
neutral as well
How to direct Edges for graph
The algorithim uses DFS or Depth First Search to connect nodes in the intial seeded network. For large networks with lots of nodes this can take a while
to complete. The GO Term option of directing edges is not yet fully developed.
Bootstraping Parameters
The user has the option of bootstraping the samples to generate random networks. This feature is optional.
This panel allows the user provide the number of time random samples will be generated in the 'Number of Iterations' box.
The 'Confidence Threshold' box allows to define a confidence level cutt-off . The default is 0.7 means the algorithm will select an edge
if it appears in 70% of the bootstrap networks.
Note that if bootstrap is chosen, the user is given a chance to play with diffrent cut-offs after the algorithim runs via a samll dialog box.
It creates new networks for each new threshold, that can be viewed in Cytoscape via Gaggle broadcast.
Population and Cluster Selection
The user has to choose a cluster that BN algorithm would use to run the analysis. By default the first cluster is highlighted.
Note, that there is max limit in terms of number of genes that this algorithm can handle. If a cluster is chosen that exceeds the maximum
genes limit an error window is displayed ahoing the maximum allowable number of genes. At this point the user can choose a new
cluster, if one is already defined and is below the limit. If a cluster of allowable size is not defined, the user needs to cancel out
of BN window, create new cluster(s) and then launch the BN Analysis window again.
Running BN Parameters
This tab allows the user to customize some advanced options of the algorithm. Most users would be OK to accept the default settings in this
panel. Below is a concise description of each available option:
- Search Algorithm - The algorithm to search for best network
- Scoring Scheme - The scoring mechanism to choose from top networks
- Use Arc Reversal -
- Max. Number of Parents - Maximum number of parents each netoerk node may have
- Cross Validation Folds(K) - In absence of a training dataset how many cross validation(s) are needed
Using Support Files created for standard arrays
We have pre-created support files needed to run BN or LM analysis for some popular microarray platforms like Affymetrix, Agilent etc. Currently we are providing support files 3 species Human, Mouse & Rat. MeV comes preloaded with the files for 2 array types in the ~/data/BN_files folder.
- Afymetrix Human U133 Plus 2 Array
- Affymetrix Mouse 430 2 Array
- Support file FTP Location: Human, Mouse & Rat only
- ftp://occams.dfci.harvard.edu/pub/bio/tgi/data/Resourcerer/Human
- ftp://occams.dfci.harvard.edu/pub/bio/tgi/data/Resourcerer/Mouse
- ftp://occams.dfci.harvard.edu/pub/bio/tgi/data/Resourcerer/Rat
- File Naming Conventions:
- All BN/LM related files ends with *_BN.zip. E.g.: affy_HG-U133_Plus_2_BN.zip
- All files start with array/chip vendor name, affy for Afymetrix. E.g.: affy_HG-U133_Plus_2_BN.zip
- Vendor name is followed by chip/array name. E.g.: affy_HG-U133_Plus_2_BN.zip
- Contents of zip files: All zip files contain 6 files
- affyID_accession.txt
- res.txt
- symArtsGeneDb.txt
- symArtsPubmed.txt
- all_ppi.txt
- gbGO.txt
- Steps to use the pre-designed support files. Example array chosen for illustration is Affymetrix Human U133 Plus 2. To use any array follow the steps below:
Download your species & array specific file from the FTP location mentioned above. E.g. affy_HG-U133_Plus_2_BN.zip
- Extract the contents of the zip under the following MeV directory: ~/data/BN_files/
- Once extracted, a folder by the array name will be created. In this case if the example file was downloaded, the following location will now exist: ~/data/BN_files/affy_HG-U133_Plus_2_BN
- Verify all 6 files exist.
- From Mev launch LM or BN module.
- In the start-up dialogue make sure the ‘File(s) Location’ box points to the folder where the supporting files are downloaded for the species and array concerned. If the example array was chosen, the text box should point to ./data/BN_files/affy_HG-U133_Plus_2_BN folder.
- Now you are ready to start the algorithm.