ways to determine search results through measuring link popularity.
Kleinberg, a professor of computer science at Cornell, also worked on IBM’s Clever Project. Kleinberg’s research in this area relies on what he calls hubs and authorities: The way to find good hub pages is to find good authority pages that they link to, and the way to find good authorities is to find good hubs that link to the authorities. In a sense, this circular problem amounts to a mathematical version of Oroboros, the mythological serpent that swallows its own tail. And it is here that spectral graph theory made one of its most important contributions to Web search by offering a way to break that circularity.
Kleinberg explains that overcoming this problem involves using an eigenvector to determine the relative value of a node’s endorsement. If one node endorses its neighbor, the quality of the endorsement would be equivalent to the sum of the qualities of every node endorsing that neighbor node, and so on. You can keep playing this hub-and-authority game all the way out to infinity. Eventually, though, things start to stabilize. “What they start to stabilize on is an eigenvector of the graph,” Kleinberg says. “In Web search, an eigenvector of the graph is essentially the infinite limit of a sequence of increasingly refined estimates of quality.”
The table here, derived from Kleinberg’s seminal 1997 IBM research report titled Authoritative Sources in a Hyperlinked Context, serves as a good example for how automated semantic separation can be accomplished with eigenvectors. On the most positive weight nodes of the first eigenvector for the search “jaguar,” spectral analysis turned up sites related to the Atari Jaguar video game console. On the second eigenvector, spectral analysis turned up sites related to the Jacksonville Jaguars football team. And on the third eigenvector, spectral analysis turned up Jaguar car dealers. The numbers in the table represent coordinates from the first few eigenvectors; thus, the pages with large coordinates in each of these eigenvectors correspond to results for different meanings of the query term. This is the sense in which the different eigenvectors are serving to distinguish between different meanings of a query.
According to Kleinberg, the ability
to break through the problem of circularity with eigenvectors is why spectral graph theory is the natural mathematical language for talking about ranking on the Web. “But I think there is more basic research that needs to be done because it’s not clear that we’ve seen the full power of the technique,” he says.
Kleinberg points out that most applications of spectral graph theory have been related to finding global, large-scale features, such as the highest-qual-ity Web pages or the most significant user clusters in a customer database. He suggests that spectral analysis is not as naturally suited to finding needle-in-a-haystack details, such as 100 interesting Web pages in billions of Web pages or a few unusual purchases in terabytes of customer data. And when it comes to the theory itself, Kleinberg says that in some cases there are no guarantees for the quality of the solutions that spectral analysis offers. “So, trying to actually prove some guarantees on the performance of algorithms based on spectral analysis is a very nice open question,” he says. “And in the process of trying to find proofs for the performance of spectral algorithms, we might in the process invent new algorithms.”
Breakthroughs or not, it seems likely that spectral analysis will remain a favorite tool among mathematicians and computer scientists working with vast amounts of data, including those doing work in computer vision, image recognition, and any other area of research that might benefit from advanced pat-tern-recognition strategies.
Based in Los Angeles, Kirk L. Kroeker is a freelance editor and writer specializing in science and technology.
DNA is seen by many
researchers as the classic
information storage system.
Just four bases (adenine,
thymine, guanine, and cytosine)
are required to code ongoing
combinations of multiple amino
acids that ultimately steer the
cellular machinery. Inspired by
the notion that DNA could be
used to store nonbiological data,
a team of researchers in Japan
has created the first DNA strand
made from artificial bases. The
researchers, in a paper recently
published in the Journal of the
American Chemical Society, explain
all the components of their DNA
product are nonnatural, yet they
spontaneously form duplexes
with the corresponding opposite
base, and these bonds are very
similar properties to those
found in natural DNA. The
team hopes this artificial DNA
could offer a range of real-world
applications, from using
DNA to store data, to using
it in biomedical and
nanotechnology settings.
IEEE Honors
Gordon E. Moore will receive the IEEE Medal of Honor at the upcoming IEEE Annual Honors Ceremony, Sept. 20 in Quebec City, Canada. Other individuals to be recognized for work that “improved the world in which we live” are: Ralph H. Baer, Sir Timothy Berners-Lee, Joseph Bordogna, Don Coppersmith, Teuvo Kohonen, Leslie Lamport, Chrysostomos L. Nikias, Richard F. Rashid, Raj Reddy, and Alan Jay Smith.
EATCS Award
Leslie G. Valiant received the 2008 EATCS Award from the European Association for Theoretical Computer Science in recognition of his distinguished career in theoretical computer science.
References:
Archives