ways to determine search results through
measuring link popularity.
Kleinberg, a professor of computer
science at Cornell, also worked on IBM’s
Clever Project. Kleinberg’s research in
this area relies on what he calls hubs
and authorities: The way to find good
hub pages is to find good authority pages
that they link to, and the way to find good
authorities is to find good hubs that link
to the authorities. In a sense, this circular problem amounts to a mathematical
version of Oroboros, the mythological
serpent that swallows its own tail. And it
is here that spectral graph theory made
one of its most important contributions
to Web search by offering a way to break
that circularity.
Kleinberg explains that overcoming
this problem involves using an eigenvector to determine the relative value
of a node’s endorsement. If one node
endorses its neighbor, the quality of the
endorsement would be equivalent to
the sum of the qualities of every node
endorsing that neighbor node, and so
on. You can keep playing this hub-and-authority game all the way out to infinity.
Eventually, though, things start to stabilize. “What they start to stabilize on is
an eigenvector of the graph,” Kleinberg
says. “In Web search, an eigenvector of
the graph is essentially the infinite limit
of a sequence of increasingly refined estimates of quality.”
The table here, derived from Kleinberg’s seminal 1997 IBM research report titled Authoritative Sources in a
Hyperlinked Context, serves as a good
example for how automated semantic
separation can be accomplished with
eigenvectors. On the most positive
weight nodes of the first eigenvector for
the search “jaguar,” spectral analysis
turned up sites related to the Atari Jaguar video game console. On the second
eigenvector, spectral analysis turned
up sites related to the Jacksonville Jaguars football team. And on the third
eigenvector, spectral analysis turned up
Jaguar car dealers. The numbers in the
table represent coordinates from the
first few eigenvectors; thus, the pages
with large coordinates in each of these
eigenvectors correspond to results for
different meanings of the query term.
This is the sense in which the different
eigenvectors are serving to distinguish
between different meanings of a query.
According to Kleinberg, the ability
spectral graph
theory is the natural
mathematical
language for talking
about ranking on
the Web, says Jon
Kleinberg.
to break through the problem of circularity with eigenvectors is why spectral
graph theory is the natural mathematical language for talking about ranking
on the Web. “But I think there is more
basic research that needs to be done because it’s not clear that we’ve seen the
full power of the technique,” he says.
Kleinberg points out that most applications of spectral graph theory have
been related to finding global, large-scale features, such as the highest-qual-ity Web pages or the most significant
user clusters in a customer database.
He suggests that spectral analysis is not
as naturally suited to finding needle-in-a-haystack details, such as 100 interesting Web pages in billions of Web pages
or a few unusual purchases in terabytes
of customer data. And when it comes to
the theory itself, Kleinberg says that in
some cases there are no guarantees for
the quality of the solutions that spectral
analysis offers. “So, trying to actually
prove some guarantees on the performance of algorithms based on spectral
analysis is a very nice open question,”
he says. “And in the process of trying to
find proofs for the performance of spectral algorithms, we might in the process
invent new algorithms.”
Breakthroughs or not, it seems likely
that spectral analysis will remain a favorite tool among mathematicians and
computer scientists working with vast
amounts of data, including those doing
work in computer vision, image recognition, and any other area of research
that might benefit from advanced pat-tern-recognition strategies.
Based in Los Angeles, Kirk L. Kroeker is a freelance
editor and writer specializing in science and technology.
Science
Natural
Storage
DNA is seen by many
researchers as the classic
information storage system.
Just four bases (adenine,
thymine, guanine, and cytosine)
are required to code ongoing
combinations of multiple amino
acids that ultimately steer the
cellular machinery. Inspired by
the notion that DNA could be
used to store nonbiological data,
a team of researchers in Japan
has created the first DNA strand
made from artificial bases. The
researchers, in a paper recently
published in the Journal of the
American Chemical Society, explain
all the components of their DNA
product are nonnatural, yet they
spontaneously form duplexes
with the corresponding opposite
base, and these bonds are very
similar properties to those
found in natural DNA. The
team hopes this artificial DNA
could offer a range of real-world
applications, from using
DNA to store data, to using
it in biomedical and
nanotechnology settings.
Computer Science
CS Award
Winners
IEEE Honors
Gordon E. Moore will receive
the IEEE Medal of Honor at
the upcoming IEEE Annual
Honors Ceremony, Sept. 20 in
Quebec City, Canada. Other
individuals to be recognized for
work that “improved the world
in which we live” are: Ralph
H. Baer, Sir Timothy Berners-Lee, Joseph Bordogna, Don
Coppersmith, Teuvo Kohonen,
Leslie Lamport, Chrysostomos
L. Nikias, Richard F. Rashid, Raj
Reddy, and Alan Jay Smith.
EATCS Award
Leslie G. Valiant received the
2008 EATCS Award from the
European Association for
Theoretical Computer Science in
recognition of his distinguished
career in theoretical computer
science.