Panel 1


Finding Motifs

A frequent first assumption in the de novo identification of binding sites for regulatory factors is to equate biological significance with statistical over-representation (the tendency of a short DNA sequence or motif to occur more often in a set of upstream sequences than expected given its background frequency of occurrence in the genome). At least fifteen different motif-finding algorithms have been developed over the years. Most belong to one of two general classes:

Most pattern-driven methods are based on an exhaustive enumeration of motifs. This process generates many low scoring motifs unlikely to expand to a cis-regulatory element, as well as artificially high scoring artifacts contained within or overlapping the actual cis-regulatory elements.

BEAM: a novel algorithm for motif discovery

Here, we present an enumerative algorithm, BEAM (Beam-search Enumerative Algorithm for Motif-finding), that is based on a pruned breadth-first search, called a beam search. A search tree that focuses only on the high scoring submotifs at every length will be likely to identify the biologically relevant motifs while reducing the computational complexity from exponential to linear dependency on motif lengths. This effectively removes the motif length restrictions that limit pattern-driven methods.

Pruning the search space provides another advantage. Aggressive pruning reduces the running time enough to admit computationally more expensive objective functions that can be used to evaluate the biological relevance of motifs. Thus, functions that measure positional biases in the occurrences of the motif, coverage of the gene group by a given motif, or other motif properties may be better suited to the identification of cis-regulatory elements.

As shown in Figure 1, at each motif length the list is sorted by the motif score generated by the chosen objective function (e.g. over-representation). Biologically significant motifs within the zone of significance (A) are a subset of the motifs contained within the beam width (B). The majority of the motifs are pruned out of the tree (C).

Abstract
Panel 1

Panel 3

Panel 5

Panel 2

Panel 4

Panel 6