Supplemental website for CMINT


HOME		DOWNLOADS		MANUAL		CMINT DETAILS

OVERVIEW

We provide two sources: cmint_array for analyzing array data, and cmint_seq for analyzing sequencing data. cmint_array is used for clustering chromatin mark profiles from array data where we do not have a lot of zeros and we can estimate the variance. cmint_seq is used for clustering chromatin mark profiles from sequencing profiles. The input is assumed to be log transformed and normalized. We do not estimate the variance within cmint_seq. cmint_seq also is more memory efficient compared to cmint_array. Both cmint_array and cmint_seq have similar inputs and outputs.

USING CMINT

The cmint_array/cmint_seq executables take 10 different inputs can be run in the following manner. ./cmint_array celltype_order genegroup maxk celllineage clusterassignments rand[rseed|none] outputDir mode[learn|generate] srcnode inittype[uniform|branchlength] p_diagonal_nonleaf The input arguments are as follows:

celltype_order: A file describing the order of the cell types and is needed to parse the genegroup file. Example: ../data/reprogramming/specorder.txt
genegroup: A file has the groups of genes/regions. The format of this file is the name of the group followed by a comma-separated unique identifier of a region in the cell type. The comma separates different celltypes and order of the cell types is specified by celltype_order. Example: ../data/ogids_notfilterexp.txt
maxk: The maximum number of clusters
celllineage: A file describing the tree of cell types. The format of this file is 2 column, tab-separated. First column is the name of the child and the second column is the name of the parent of each branch. Example: ../data/reprogramming/celltype_tree3_ancestor.txt
config : A file describing the locations of the initial cluster assignments and the mark data. The cluster assignment files can be used to specify which regions should be used for the CMINT algorithm. Example: ../data/reprogramming/config_k15.txt
outputDir: Location of results
model: run cmint in learn or sampling mode
srcnode: The celltype specifying a reference point with which the rows in some output files are specified.
inittype: An input argument specifying how the transition matrices should be initialized
p_diagonal_nonleaf: If inittype is uniform, p_diagonal_nonleaf is a number between 0-1 specifying the probability with which a region/gene maintains its module assignment. If inittype is branchlength, then p_diagonal_nonleaf is a file specifying the transition probabilities. WARNING, the option tested with CMINT is when inittype is uniform.

Please see cmint_src/README.txt for detailed examples of using CMINT on different datasets

Utilities to process outputs from CMINT runs

The utils.tgz contains programs and scripts that can be used to generate gene sets that exhibit different transitions. Please refer to the README.txt in this directory.