Darmouth Faculty Reserach and Scholarship Today
  Darmouth College

Features Dartmouth Faculty Research and Scholarship Today
George Cybenko Bringing Data Retrieval into the 21st Century

George Cybenko is the Dorothy and Walter Gramm Professor of Engineering Sciences.

Can You Name Five Black Women Business Leaders?

Space Weather Starts at the Sun

Health Care by the Numbers (cover feature)

Can a Robot Predict Space Weather?

Bringing Data Retrieval into the 21st Century

Captivated by Ecological Complexities

In the Eyes of the Beholder

The Foundation for a Livable Future

Inventing a new and dynamic approach to computer security

Legend has it Sir Isaac Newton was idling beneath an apple tree when a falling piece of fruit caught his attention, planting the seeds of his theory of gravity. While Thayer School of Engineering Professor George Cybenko’s new approach to organizing and retrieving information evolved from less fanciful origins, he admits the basic idea was something of an epiphany.

The technology Cybenko is working on is a radical departure from traditional information retrieval methods. Historically, he says, people have searched for information they need by applying an “Aristotelian” approach. That strategy, like the ancient philosopher’s logic, is based on premises or “givens.” If you have ever attempted to retrieve data from the Internet through a Web search engine, you have built the rules for your search by writing a Boolean expression. For example, you might ask for any documents containing the words “Dartmouth” and “engineering.” That kind of approach

doesn’t work in today’s large-scale, complex information environment, according to Cybenko.

Last year, he realized “the reason a lot of approaches were failing or stalling is that they literally were trying to model the world the way Aristotelian science did up to the 1500s or 1600s. It failed because it just couldn’t explain nature, until Newton came along with a whole different process-oriented way to view the world in terms of states and dynamics.”

Following Newton’s example, the Process Query System (PQS) paradigm that Cybenko has developed focuses on processes rather than rules or static expressions. It describes a sequence of steps and evidence that supports transitions between those steps. Cybenko and his research team of graduate students, postdoctoral students and research scientists have even developed a working prototype of a PQS called TRAFEN (TRAcking and Fusion ENgine).

One of the specific applications Cybenko’s group has investigated is the detection of computer worms. A worm is a self-propagating code that goes through stages wherein it (1) finds machines or other computers, (2) moves to those machines, (3) infects them, and (4) modifies files on those machines. Different worms use different mechanisms to self-propagate, but all go through these stages, which are characterized by specific indicators or behavior.


The figure depicts a simple model of a dynamic process where internal, nonobservable states (A,B,C) emit events (a, b, g) but not in a way that these events are uniquely associated with the hidden states. A Process Query System observes sequences of events and builds associations between them and sequences of hidden states of the underlying process.

TRAFEN uses detailed descriptions of this behavior to locate computer infections. In other words, “This is what a worm does; find anything that has that behavior.” The more thorough the process described, the fewer false alarms will show up, although Cybenko cautions that being too specific could overlook some worms altogether. Another application under development involves vehicle tracking in a network of acoustic sensors. Ultimately, Process Query Systems approaches are more efficient and scalable than Aristotelian approaches, which depend on rule-based processing of individual observations.

The PQS breakthrough is an idea whose time has come, says Cybenko. “It’s only become obvious in the last few years that existing solutions aren’t scaling with the scope of the problem because our ability to network and collect a lot of information is relatively new. Although, in principle, PQS could have been developed 10 years ago, the computing power and motivation to drive the need for a new approach were not quite there. Computing power today is very small and cheap, with established Web and Internet standards, so it’s very easy now to connect many different devices to a big network. All of a sudden, you can collect and contemplate the ability to aggregate very quickly—in real time—a lot of information. So, how do you do that? People haven’t really been thinking about that much. It’s a new kind of problem.”

He credits the interdisciplinary composition of his team, whose expertise integrates computer science, communication, and mathematics, with the timely technology. “Researchers tend to be disciplinary so if you’re a computer scientist trying to tackle these problems, you probably don’t have the systems theory or the electrical engineering background that’s necessary to think the Newtonian way,” he says. “The flip side is, if you’re an electrical engineer, you probably don’t realize that there are problems in computer security or network management that can be formulated in terms of process detection. So I think our success is largely due to the right mix of people with the right mix of backgrounds.”

He adds that they have been working on related areas for five or six years, “so we’ve built up a solid repertoire of ideas, technologies and knowledge about what’s going on.”

The team’s novel approach to information retrieval already has stimulated market interest. The U.S. government is particularly interested in applying the findings. In fact, grants from the Advanced Research and Development Activity, the Department of Homeland Security, and the Defense Advanced Research Projects Agency supported the research. Commercial software companies are also expressing interest in developing products around the PQS ideas.

PQS can be used to detect various types of physical behaviors in the environments, such as vehicles moving in a region and plumes of airborne chemical or biological agents, which offers obvious security advantages.

Beyond that, he says, the government and others are interested in the research because “people are recognizing that in many applications, like infrastructure monitoring and large-scale sensor networks, nobody really has a good idea of how to proceed because they’re still thinking the Aristotelian way.”

It is, he adds “high-risk, high-payoff research. We’re trying to change how people have been doing things for a long time.”


“The basic idea was something of an epiphany.”

  Dartmouth Faculty Research and Scholarship Today    
Home | Features | Essays | About | Archive | Contact | Scholarship Now

Copyright © 2003 Trustees of Dartmouth College