Arboretum: reconstruction and analysis of the evolutionary history of condition-specific transcriptional programs

Sushmita Roy1,2,3, Ilan Wapinski1,4, Jenna Pfiffner1, Courtney French1, Amanda Socha1, Jay Konieczka1, Naomi Habib6, Manolis Kellis1,2, Dawn Thompson1, Aviv Regev1,5

Arboretum is an algorithm for identifying modules across species. Modules are defined as sets of genes that are co-expressed. Arborteum can reconstruct the module membership of genes in extant as well as ancestral species.

Here you can download the software for applying Arboretum to different multi-species datasets.

Comparative functional genomics studies the evolution of biological processes by analyzing functional data, such as gene expression profiles, across species. A major challenge is analyzing genomic profiles collected in a complex phylogeny. Here, we present Arboretum, a novel and scalable computational algorithm that integrates expression data from multiple species with species- and gene- phylogenies to infer modules of co-expressed genes in extant and ancestral species, their evolutionary histories and ancestral states. We develop new measures that use the reconstructed histories to determine the patterns of conservation and divergence in gene regulatory modules, and assess the impact of changes in gene content and copy number on module evolution. We used Arboretum to study the evolution of the transcriptional response to heat shock in eight species of Ascomycota fungi. We found that although modules and their expression are largely conserved, there is substantial divergence in the most induced module. Divergence of module organization was facilitated by gene duplication, with at least one of the members of the paralogous pair typically changing its module assignment following duplication. Compared to the transcriptional response to glucose depletion in the same species, the heat shock response diverged more substantially. Nevertheless, both responses exhibit the same general properties of module conservation and divergence, suggesting these are general principles of evolution of transcriptional programs. Arboretum and its associated analyses provide a comprehensive framework to systematically study regulatory evolution of condition-specific responses.
1 Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge MA, 02142
2 Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT), Cambridge, MA 02139
3 Biostatistics and medical informatics, University of Wisconsin, Madison, WI 53715
4 Department of Systems Biology, Harvard Medical School, Boston MA, 02140
5 Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge MA, 02140
6 School of Computer Science & Engineering, Hebrew University, Jerusalem, 91904, Israel
Designed by Web Page Templates