We present a pair of algorithms representing a significant improvement in the de novo identification of cis-regulatory elements. Our consensus-based algorithms employ a beam search to provide a fast, accurate and robust solution requiring no inputs other than a list of unaligned upstream sequences and a species name. They embody a general statistical framework for cis-regulatory element identification that is independent of the objective function and species. The first algorithm, BEAM limits its search space by focusing only high-scoring candidate motifs. This enables the use of direct counts for background frequencies instead of probabilistic estimates. The second algorithm, PRISM, identifies highly degenerate cis-regulatory elements based on our finding that the statistical over-representation of a highly degenerate cis-regulatory element can be expressed as a linear combination of the over-representation of its non-degenerate instantiations. PRISM takes non-degenerate motifs as its input, uses them to construct a consensus representing a set of closely related sequences, and then generates a Position Weight Matrix from the consensus. The BEAM-PRISM combination is extremely robust to noise, identifying cis-regulatory elements of arbitrary length and degeneracy in seconds. In comparison tests, BEAM-PRISM outperformed 7 other motif-finding algorithms on 28 S. Cerevisiae regulons.
The poster panels were arranged as shown here. Click on a Panel below to see that panel.