ticle and it still is not what the article
is about,” Broder says.
Therefore, Broder and the 30 researchers who work for him are finding
ways to glean the meaning of a page.
One promising avenue combines semantic and syntactic features. A semantic phrase categorizes the page and the
ads into a 6,000-node topic taxonomy
and compares the proximity of the two
types of classes as a factor in ranking
ads. The hierarchical taxonomy also improves the matching of ads that don’t fit
a page’s exact topic. Keyword matching
is still needed to capture more granular content, such as a specific brand of
automobile. “We decided that what the
article is about should count for about
80% and the words should count for
20%,” Broder says.
Another area of interest is using statistical analysis to measure the effect of
exogenous events on browsing behavior
and adjust the advertisements accordingly. Varian cites short-lived examples,
such as this year’s rare snowfall in England, or longer-term ones such as the
worldwide recession. “In the last few
months, there is a big increase in interest in price-sensitive products,” Varian
says. “The advertisers, in turn, are trying to respond.”
All three companies are close-lipped
about which of their research has been
commercialized, but say that new ideas
for algorithms are quickly incorporated
into their bidding mechanisms and advertiser tools. Bottom-line results are
secret, but the search engines all collect
metrics such as revenue per search.
Machine learning, another major
focus, concentrates on training algorithms to scan pages for meaning,
a technique employed successfully
on single-topic documents with the
aid of machine-generated labels, but
trickier to perform on Web pages, with
their assortment of graphics, text, and
topics. Microsoft researchers have
learned how to employ a type of multiple instance learning to automate
classification of sub-documents on
pages with incomplete labels and to
detect the presence of certain types of
content.
“Most of what we do can be boiled
down to understanding intent,” says
Eric Brill, general manager of Microsoft adCenter Labs. By analyzing search
strings, for example, algorithms can
predict if a person is interested in ads.
Some strings are pure attempts at finding information, while others, such as
“buy Canon digital camera,” have clear
commercial intent. “When consumers don’t have commercial intent, you
don’t want to put ads in front of them,”
Brill says.
Much work focuses on ensuring that
new bidding mechanisms don’t have
incentives for advertisers to misrepresent click-through rates to get better ad
placement. In the decentralized economy of the Internet, truthfulness is a
currency reinforced by carefully crafted
algorithms. “People are out there to
make money,” says Thore Graepel, a senior researcher at Microsoft Research.
“We need to build mechanisms where
everyone benefits.”
news
One might expect the speed and volume of data to create a capacity problem, but the researchers express mixed
opinions. Graepel says semantic analysis creates an extra burden. “You will
hit a computational bottleneck, that’s
pretty clear,” he says. To avoid this, researchers optimize algorithms to make
the best decisions with the smallest
possible data sets. But they also have
faith in engineers’ ability to exploit
techniques such as parallel processing. “It’s surprising how they are always
able to scale to deal with these new algorithms,” Varian says.
Privacy regulations remain an obstacle to personalizing ads, says Graepel.
The existing opt-in, opt-out model lets
users choose to reveal personal data in
exchange for discounts and other incentives. Researchers are also investigating aggregating data on Web traffic
to more accurately match ad categories
with coarsely defined groups of users
who identify their interests simply by
visiting certain types of Web sites.
Fortunately, there is hope for avoiding embarrassments like the ill-placed
Chevy ad. Researchers at Microsoft
adCenter Labs claim their sub-docu-ment classification methods can prevent incompatible ads and Web sites
from ever hooking up. You might call
it a reverse matchmaker, just the sort
of odd little entity the Internet’s inventors might never have imagined.
David Essex is a freelance science writer based in
peterborough, nh.
© 2009 aCm 0001-0782/09/0500 $5.00
Education
Computer Science Enrollment Increases
enrollment in computer
science classes in the United
states has increased for the
first time in six years, according
to the Computing research
association’s (Cra’s) annual
Taulbee survey.
Total enrollment by majors
and pre-majors in computer
science is up 6.2% per department
over last year. If only majors are
considered, the increase is 8.1%,
according to the Cra survey,
which collected enrollment
data in fall 2008 from computer
science and computer
engineering departments at 192
ph.d.-granting universities.
“The upward surge of
student interest is real and
bigger than anyone expected,”
says peter lee, incoming chair
of Cra. “The fact that computer
science graduates usually find
themselves in high-paying jobs
accounts for part of the reversal.
Increasingly students also are
attracted to the intellectual
depth and societal benefits of
computing technology.”
Computer science graduates
on average earn 13% more than
the average college graduate,
according to the U.s. department
of labor, and future job prospects
for computer science graduates
are higher than for any other
science or engineering field.
The average number of
new students per department
majoring in computer science is
up 9.5% over last year. Computer
science departments are
replenishing the freshman and
sophomore ranks with larger
groups than they are graduating
as seniors, and computer science
graduation rates should increase
in two to four years as these new
students graduate.
The total number of ph.d.
graduates among responding
departments grew to 1,877 for
the period July 2007 to June 2008, a
5.7% increase over the previous year.
one area that didn’t show
improvement is the number
of women pursuing computer
science degrees, which held
steady at 11.8%.