Given the large amount of sequence information currently available and the prospect of even more becoming available on a daily basis, methods need to be developed to identify sequences within the genome that have biological function. To do this, we have been developing algorithms to identify potential sequence motifs that might function in transcription and/or gene regulation. A large database has been constructed containing all possible motifs and has been searched using specialized code to identify motifs meeting criteria that might be appropriate to biological function. Our initial results were presented in a poster at a genomics meeting in November 2001. We have identified sets of genes that share common upstream motifs are candidates for regulatory elements. We have also identified a number of motifs that have unusual non-uniform distributions in the genome. Their function is unknown at this time.
Currently, we are focusing our efforts on an different approach that uses a BEAM (Beam searching Enumerative Algorithm for Motif finding) search to identify motifs in sets of genes. The BEAM approach examines a small subset of all possible motifs, yet identifies the vast majority of the true positive answers under appropriate conditions. It has an advantage over other enumerative methods in that it eliminates almost all false positive results and trims down the search space by orders of magnitude, while also demonstrating very low false negative rates compared to non-enumerative methods. We have been able to demonstrate that our BEAM approach does better than all current algorithms in identifying true biologically active motifs. Papers are being written currently. This research was presented in a poster at a Computational Genomics meeting in October 2004.
A third approach we are beginning to explore is one that uses a new kind of visualization method to display sets of motifs for any given set of genes. In this approach we allow the human brain to make some distinctions visually which can then be followed up with actual sequence analysis.
DNA microarrays make it possible to analyze the expression patterns of thousands of genes simultaneously. In a typical experiment, one might examine the level of expression of thousands of genes under many different conditions or time points. One of the key goals of this technology is to understand how genes are regulated. Groups of genes that are turned up and down together are called coordinately regulated. We are taking data from genes that appear to be coordinately regulated in microarray experiments and then examining those genes for common regulatory motifs using the BEAM approach we have developed. We can identify shared motifs among the various genes based on motif occurrences, positions, or other properties of the motif. Post-processing of the common motifs allows us to learn more about the starting set of genes - Are they all really regulated in the same way by the same factors? Do some of the genes not really belong in this set? Does the set of genes represent a single group of coordinately regulated genes or perhaps does the set contain multiple sets of genes that are regulated in parallel by different mechanisms?
Carlson, J.M., Chakravarty, A., Gross, R.H., BEAM: A beam search algorithm for the identification of cis-regulatory elements in groups of genes, J. Comp. Biol. 13:686-701 (2006)
Carlson, J.M., Chakravarty, A., Khetani, R.S., and Gross, R.H. Bounded search for de novo identification of degenerate cis-regulatory elements, BMC Bioinformatics 7:254-285 (2006)
The DNA Inspector II, a package of programs to manipulate and analyze DNA sequences on the Apple Macintosh (Jan. 1986). Upgrade to DNA Inspector II+ (April 1987), DNA Inspector IIe (Sept. 1988).
Gene Communicator, A program for communicating with and utilizing BioNet and GenBank (Nov. 1986).
Techniques in Molecular Genetics A HyperCard stack describing most of the techniques in use in molecular genetics today. Used as reference and study aid. (1989)
The Gene Construction Kit, A program for manipulating and displaying DNA sequences and constructs. (1990)
The Gene Inspector, A sequence analysis program and electronic notebook. (1996)
The Gene Construction Kit 2 (1997)
The Gene Inspector 1.5 (1999)
The Gene Construction Kit 2.5* (2002)
* available from Textco BioSoftware, Inc.