We provide two sources: cmint_array for analyzing array data, and cmint_seq for analyzing sequencing data.
cmint_array is used for clustering chromatin mark profiles from array data where we do not have a lot of zeros and we can estimate the variance.
cmint_seq is used for clustering chromatin mark profiles from sequencing profiles. The input is assumed to be log transformed and normalized.
We do not estimate the variance within cmint_seq. cmint_seq also is more memory efficient compared to cmint_array. Both cmint_array and cmint_seq
have similar inputs and outputs.
The cmint_array/cmint_seq executables take 10 different inputs can be run in the following manner.
./cmint_array celltype_order genegroup maxk celllineage clusterassignments rand[rseed|none] outputDir mode[learn|generate] srcnode inittype[uniform|branchlength] p_diagonal_nonleaf
The input arguments are as follows:
- celltype_order: A file describing the order of the cell types and is needed to parse the genegroup file. Example: ../data/reprogramming/specorder.txt
- genegroup: A file has the groups of genes/regions. The format of this file is the name of the group followed by a comma-separated unique identifier of a region in the cell type. The comma separates different celltypes and order of the cell types is specified by celltype_order.
Example: ../data/ogids_notfilterexp.txt
- maxk: The maximum number of clusters
- celllineage: A file describing the tree of cell types. The format of this file is 2 column, tab-separated. First column is the name of the child and the second column is the name of the parent of each branch. Example: ../data/reprogramming/celltype_tree3_ancestor.txt
- config : A file describing the locations of the initial cluster assignments and the mark data.
The cluster assignment files can be used to specify which regions should be used for the CMINT algorithm.
Example: ../data/reprogramming/config_k15.txt
- outputDir: Location of results
- model: run cmint in learn or sampling mode
- srcnode: The celltype specifying a reference point with which the rows in some output files are specified.
- inittype: An input argument specifying how the transition matrices should be initialized
- p_diagonal_nonleaf: If inittype is uniform, p_diagonal_nonleaf is a number between 0-1 specifying the probability with which a region/gene maintains its module assignment. If inittype is branchlength, then p_diagonal_nonleaf is a file specifying the transition probabilities. WARNING, the option tested with CMINT is when inittype is uniform.
Please see cmint_src/README.txt for detailed examples of using CMINT on different datasets |
Utilities to process outputs from CMINT runs |
The utils.tgz contains programs and scripts that can be used to generate gene sets that exhibit different transitions. Please refer to the README.txt in this directory. |