Bob Gross' Lab

Computational Molecular Biology, Gene Regulation

Identifying Regulatory Elements in a Genome

Research in the lab is aimed at developing new ways of utilizing computational approaches to answer biological questions. Currently, our major focus involves discovering and exploring regulatory motifs in genomes - those sequences that are involved in controlling gene expression in response to internal or external signals. The long-term goal is to apply these approaches to understanding regulatory networks of genes.

Often, regulatory motifs/transcription factor binding sites are comprised of a number of short sequences containing ambiguous nucleotides, sometimes separated into several pieces. To approach this problem, we have developed new algorithms that can identify sets of short sequence motifs present in a group of co-regulated genes. Our BEAM, SPACER, PRISM and SCOPE algorithms consistently outperform other approaches and can identify putative regulatory sequences even in the presence of high noise levels in the data. Thus, the algorithm is particularly useful in identifying motifs from genes identified through microarray experiments. This poster gives an overview of SCOPE (SCOPE web site).

How SCOPE works

Motif Finding

There are three types of motifs that need to be identified:
    degenerate multipart:

We developed different algorithms to find each of these motif types and combined them into a single program called SCOPE (Suite for Computational identification Of Promoter Elements). BEAM finds non-degenerate motifs, PRISM finds degenerate motifs, and SPACER finds degenerate motifs that have “spacer” regions internally. SCOPE runs all three of these algorithms and merges the output to produce a list of the best scoring motifs of any type.

These algorithms dramatically cut down the search space needed to identify such elements by using a beam search and different objective functions. The algorithms were presented at a poster at the Computational Genomics meeting in Reston, Virginia in October 2004. The results from SCOPE reflect not only the overrepresentation of a motif in the set of trial genes comparted to background, but reflect the position distribution of the motifs upstream of their genes. For example, if a motif occurs at position -27 in front of all eight genes in a set, that motif would score significantly higher than if it did not have a preferred location.

Modeling and Identifying Regulatory Modules across Fungal Evolution

Regulatory motifs are typically identified by their overrepresentation in upstream regions in target genes compared to their presence in upstream regions for the rest of the genome (non-target genes). However, there are often other biologically relevant metrics, such as the position distribution of the motif in the upstream region (compared to background), and the preference for a particular orientation of the motif with respect to the transcription start site compared to background. We used the motif finder SCOPE (which considers these criteria) as a starting point to identify potential regulatory modules in sets of genes. We define a module as being two motif instances in a conserved pattern in the genes they regulate. Modules may be homotypic (two instances of the same motif) or heterotypic (two different motifs). To identify biologically relevant modules, we consider 4 different criteria: (1) intermotif distances, (2) motif orientations, (3) module position distribution upstream of the genes they regulate, and (4) component motif scores from SCOPE reflecting the overrepresentation and position distribution of a motif. These four metrics are used to generate a module score that indicates the quality of a candidate module.

This approach has been used to identify regulatory modules in the genes involved in ribosome biogenesis and assembly (RBA) in 39 different fungal species. We have been able to show the relationships between modules and evolutionary history across the fungal tree. In addition, we have used the module found in the RBA genes in S. cerevisiae to discover other genes in that species that contain the same module. These other genes are coexpressed with the RBA genes and have functions compatible with a role in RBA. Thus, we have been able to model and recognize modules in order to characterize and uncover other genes that might be involved in the same biological process. We suggest that this approach may be revealing a higher order of biological regulatory signal that can be used to better understand and identify gene regulatory mechanisms.

The computational discovery and evolutionary implications of regulatory motif patterns responsible for transcriptional activation of ribosome biogenesis

Identifying 2 or more motifs that occur in modular patterns

Proteins involved in ribosome biogenesis (RB) are responsible for the formation, assembly and transport of ribosomal constituents. We examined the conservation and evolution of their transcriptional regulation across fungal phylogeny. It is important to understand fungal gene regulation because of their importance in agriculture and medicine.

Our approach is based on the use of SCOPE, an ensemble motif finder that combines three search strategies to look for regulatory motifs. We analyzed upstream regions of genes orthologous to Saccharomyces cerevisiae RB genes in 25 other fungal species. We then compared predicted motifs to RRPE and PAC motifs, known to be important in S. cerevisiae. The best matches were analyzed to identify evolutionary specific motif patterns.

The motifs were generally found in the 0-200 bps region of the upstream sequence, displaying a non-random distribution. There are four different patterns based on the enrichment in either one of two motifs or their combination. Motif combinations (modules) were observed in two different groups of closely related fungi (six S. cerevisiae-like and five Neurospora crassa-like). These modules typically occurred in a preferred motif order, different for the two fungal groups mentioned above. Statistical examination of the distance between motifs in modules demonstrated a very significant conservation of the intermotif distances. While both fungal groups with modules displayed an RRPE motif (AAA[AT]TTTT), they showed slightly different versions of PAC motif: CTCATC for S. cerevisiae-like species and CTTATC for N. crassa-like species. Thus, there are at least two distinct modular patterns found in different fungal evolutionary branches.

These results were presented at a poster at the ISMB meeting in Boston in July 2010. Click the poster at the right to see a full sized version.

Using SCOPE to Understand Biological Systems

Neurospora Light Responsive Genes

We have applied SCOPE to examining the set of genes induced by light in Neurospora crassa in collaboration with Chen-Hui Chen of Jay Dunlap's lab. Using microarray analyses, we have identified a set of genes that respond quickly to light (15 minutes) which we call the early response genes, and also a set of genes that repond after 45 minutes (late response genes). Each of these microarray defined gene sets were analyzed using SCOPE and candidate motifs were identified.

The early response motif identified by SCOPE corresponded to a previously identified motif. This early response motif was present an average of 13 times per gene and demonstrated a unique spacing pattern between instances of the motif, with 40% of the motifs being 7 nucleotides away from their neighbors). The late response motif was examined by gel shift assays and by luciferase reporter gene synthesis from constructs, where it was found that the identified motif was responsible for binding of a specific transcription factor, sub-1, that was synthesized in the early response. Deletion mutants for this gene did not show the late response and did not synthesize a protein that bound to the late motif. Thus, SCOPE was able to computationally identify both an early and a late response element. These results were presented at a poster at the ISMB/ECCB meeting in Vienna in July 2007. Click the poster at the right to see a full sized version.


Examining fungal regulator sequences for G1-S transitions

We analyzed transcriptional systems involved in G1/S cell cycle transition and ribosome biogenesis and assembly. They comprise target motifs and transcription factors in the upstream regions of those genes. We investigated conservation and divergence of those DNA regulatory motifs across fungal evolution.

S. cerevisiae gene sets were obtained from either microarray data or Saccharomyces Genome Database. Homologous genes for the other 24 fungal species were identified by either mining public databases or performing reciprocal BLAST searches. The gene sets were analyzed by SCOPE to identify statistically significant motifs. Their upstream regions were subdivided into four quartiles (200 bps each) in order to analyze motif position preferences. High-scoring motifs with preferred upstream locations were examined in other species.

The poster is research presented at the ISMB meeting in Toronto in July 2008.


Finding regulatory signals in 3' UTRs of coregulated mRNA sets

Dendritic cells provide a critical link between innate and adaptive immunity and are essential to prime a naive T-cell response. The transition from immature dendritic cells to mature dendritic cells involves numerous changes in gene expression; however, the role of post-transcriptional changes in this process has been largely ignored. Tristetraprolin is an AU-rich element mRNA-binding protein that has been shown to regulate the stability of a number of cytokines and chemokines of mRNAs. Using TTP immunoprecipitations and Affymetrix GeneChips, we identified 393 messages as putative TTP mRNA targets in human dendritic cells. Using an RNA version of SCOPE among other approaches, new regulatory sequences were discovered. A novel finding is the demonstration that TTP can interact with and regulate the expression of non-AU-rich element-containing messages. The data implicate TTP as having a broader role in regulating and limiting the immune response than previously suspected.

This research is published in RNA 14: 888-902 (2008).

Previous development of the SCOPE algorithm components

We present a pair of algorithms representing a significant improvement in the de novo identification of cis-regulatory elements. Our consensus-based algorithms employ a beam search to provide a fast, accurate and robust solution requiring no inputs other than a list of unaligned upstream sequences and a species name. They embody a general statistical framework for cis-regulatory element identification that is independent of the objective function and species. The first algorithm, BEAM limits its search space by focusing only high-scoring candidate motifs. This enables the use of direct counts for background frequencies instead of probabilistic estimates. The second algorithm, PRISM, identifies highly degenerate cis-regulatory elements based on our finding that the statistical over-representation of a highly degenerate cis-regulatory element can be expressed as a linear combination of the over-representation of its non-degenerate instantiations. PRISM takes non-degenerate motifs as its input, uses them to construct a consensus representing a set of closely related sequences, and then generates a Position Weight Matrix from the consensus. The BEAM-PRISM combination is extremely robust to noise, identifying cis-regulatory elements of arbitrary length and degeneracy in seconds. In comparison tests, BEAM-PRISM outperformed 7 other motif-finding algorithms on 28 S. Cerevisiae regulons.

TIGR Computational Genomics conference (2004).


  • BEAM: Carlson* JM, Chakravarty* A, and Gross RH. "BEAM: A beam search algorithm for the identification of cis-regulatory elements in groups of genes." Journal of Computational Biology, 13(3):686-701 (2006)

  • PRISM: Carlson* JM, Chakravarty* A, Khetani RS, and Gross RH. "Bounded search for de novo identification of degenerate cis-regulatory elements." BMC Bioinformatics, 7(1):254 (2006)

  • SPACER: Chakravarty A, Carlson JM, , Khetani RS, DeZiel CE, and Gross RH. "SPACER: Robust identification of cis-regulatory elements with non-contiguous critical residues." Bioinformatics 23: 1029-1031 (2007)

  • SCOPE: Chakravarty, A, Carlson, JM, Khetani, RS, and Gross, RH, "A parameter-free algorithm for improved de novo identification of transcription factor binding sites." BMC Bioinformatics 8: 249 (2007)

  • SCOPE-web interface: Carlson, JM, Chakravarty, A, DeZiel, CE, Gross, RH, "SCOPE: a web server for practical de novo motif discovery." Nucleic Acids Research; doi: 10.1093/nar/gkm310. (2007)

  • Identifying elements in mRNAs: Emmons, Townley-Tilson, WHD, Deleault, KM, Skinner, SJ, Gross, RH, Whitfield, ML and Brooks, SA "Identification of TTP mRNA targets in human dendritic cells reveals TTP as a critical regulator of dendritic cell maturation." RNA 14: 888-902 (2008)
  • Neurospora light response elements: Chen, C-H, Ringelberg, CS, Gross, RH, Dunlap, JC and Loros, JJ "Genome-wide characterization of light-inducible responses reveals a hierarchical light-signaling cascade in Neurospora crassa." EMBO J, 28, 1029-1042 (2009)
  • Quartile analysis: Martyanov, V. and Gross, R.H. "Identifying functional relationships within sets of co-expressed genes by combining upstream regulatory motif analysis and gene expression information," BMC Genomics 11(supp 2):S8 (2010)
  • Finding extra genes (video): Martyanov, V. and Gross, R.H. "Using SCOPE to Identify Potential Regulatory Motifs in Coregulated Genes," J. Vis. Exp. 51 (2011)

     * these authors contributed equally to this research