5
           
HOME   CODE   DATA   SUPP METHODS   ARBORETUM DETAILS
 

Code and Usage

Source code
gmm_crossspeciescluster.tgz After downloading type

tar -xvzf gmm_crossspeciescluster.tgz

This will create a directory in the current path called gmm_crosspecies. cd in to the directory and type make. The executable is named arboretum.
Third party libraries
Arboretum needs GNU scientific library to run. We provide libraries compiled x84 64 bit Linux platform.
Usage
./arboretum specorder orthogroup maxk speciestree clusterassignments rand[rseed|none] outputDir mode[learn|generate] srcSpecies inittype[uniform|branchlength] p_diagonal_nonleaf
 
The parameters have the following interpretation:
  1. specorder: List of species
  2. orthogroup: Orthogroup mapping
  3. maxk: Number of clusters
  4. speciestree: Species tree
  5. clusterassignments: Text file with an initial set of clusters for each species. This requires that the cluster IDs for each species be identical. One way of generating these clusters is to merge the data across species, cluster the merged data and then project the cluster assignments on to each individual species.
  6. rand: a random seed for initial partitioning of the data if needed. If this is set to rand, the input in "clusterassigments" is ignored.
  7. outputDir: output directory for storing results
  8. mode [learn|generate]: takes two options "learn", or "generate". "learn" will invoke the learning part of the algorithm and "generate" will invoke the generative model to generate data for simulation studies
  9. srcSpecies: species used to map gene names to. Usually a species which is well-annotated.
  10. inittype: initialization of transition matrix. Takes two options "uniform" or "branchlength". If uniform is selected, the value of p_diagonal_nonleaf parameter should be a real-valued number used to initialize the diagonal elements of the transition matrix. If branch length is used, the value of the p_diagonal_nonleaf paramter is interpreted as a file name with branch lengths and is used to initialize the transition matrix.
  11. p_diagonal_nonleaf: a real-valued probability or a file name
 
Example Usage 1: Uniform initialization to transition matrices and initial clusters from merged data
./incAncClust specorder_allclade.txt OGid_members.txt 5 species_prob.txt cluster_conf.txt none result_dir learn Scer uniform 0.8
Example Usage 2: Initialization of transition matrices from branch lengths, and random initial clusters
./incAncClust specorder_allclade.txt OGid_members.txt 5 species_prob.txt cluster_conf.txt rand result_dir learn Scer branchlength bltree.txt
Tar file called example_inputs.tgz containing above input configuration files is available from the Data page
 
 
 
Designed by Web Page Templates