A predictive modeling approach for cell-line specific long-range regulatory interactions

Sushmita Roy1,2, Alireza Fotuhi Siahpirani3, Deborah Chasman1, Sara Knaack1, Ferhat Ay4, Ron Stewart5, Michael Wilson6, Rupa Sridharan1,7
Long range regulatory interactions among distal enhancers and target genes are important determinants of tissue-specific gene expression. Genome-scale identification of these interactions in a cell type-specific manner, especially using the fewest possible datasets, is a significant challenge. We develop a novel computational approach, Regulatory Interaction Prediction for Promoters and Long-range Enhancers (RIPPLE), that combines Random forests and structured-sparsity based multi-task learning to predict cell-type specific enhancer-promoter interactions. RIPPLE integrates published 5C data with diverse regulatory genomic datasets is based on a supervised learning framework to first identify a minimal set of features needed to predict long range regulatory interactions. Our results suggest that CTCF, RAD21, a general transcription factor (TBP) and chromatin marks as important determinants of enhancer-promoter interactions. We used RIPPLE within an ensemble approach to predict genome-wide interaction maps in cell lines with available 5C data as well as new cell lines. Computational validation of these predictions using existing ChIA-PET and Hi-C datasets showed that RIPPLE accurately predicts both long and short range interactions. We identified different classes of interacting enhancers and promoters that represent different combinations of architectural proteins and chromatin marks. Enhancer-promoter interactions in the high confidence interaction networks tend to be organized into subnetworks that are significantly enriched in house keeping and cell-type specific functions. Overall, our approach and associated genome-wide predictions provides a useful resource to understand long range gene regulation across multiple cell types.
1. Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, WI 53715, USA
2. Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, USA
3. Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53715, USA
4. Department of Genome Sciences, University of Washington, Seattle, WA 98195-5065, USA
5. Morgridge Institute for Research, 330 N. Orchard Street, Madison, WI 53715, USA
6. Genetics and Genome Biology Program, Hospital for Sick Children (SickKids) and Department of Molecular Genetics, University of Toronto, Toronto, ON M5G 1L7, Canada
7. Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI 53715, USA
Designed by Web Page Templates