ticle and it still is not what the article is about,” Broder says.
Therefore, Broder and the 30 researchers who work for him are finding ways to glean the meaning of a page. One promising avenue combines semantic and syntactic features. A semantic phrase categorizes the page and the ads into a 6,000-node topic taxonomy and compares the proximity of the two types of classes as a factor in ranking ads. The hierarchical taxonomy also improves the matching of ads that don’t fit a page’s exact topic. Keyword matching is still needed to capture more granular content, such as a specific brand of automobile. “We decided that what the article is about should count for about 80% and the words should count for 20%,” Broder says.
Another area of interest is using statistical analysis to measure the effect of exogenous events on browsing behavior and adjust the advertisements accordingly. Varian cites short-lived examples, such as this year’s rare snowfall in England, or longer-term ones such as the worldwide recession. “In the last few months, there is a big increase in interest in price-sensitive products,” Varian says. “The advertisers, in turn, are trying to respond.”
All three companies are close-lipped about which of their research has been commercialized, but say that new ideas for algorithms are quickly incorporated into their bidding mechanisms and advertiser tools. Bottom-line results are secret, but the search engines all collect metrics such as revenue per search.
Machine learning, another major focus, concentrates on training algorithms to scan pages for meaning, a technique employed successfully on single-topic documents with the aid of machine-generated labels, but trickier to perform on Web pages, with their assortment of graphics, text, and topics. Microsoft researchers have learned how to employ a type of multiple instance learning to automate classification of sub-documents on pages with incomplete labels and to detect the presence of certain types of content.
“Most of what we do can be boiled down to understanding intent,” says Eric Brill, general manager of Microsoft adCenter Labs. By analyzing search strings, for example, algorithms can predict if a person is interested in ads. Some strings are pure attempts at finding information, while others, such as “buy Canon digital camera,” have clear commercial intent. “When consumers don’t have commercial intent, you don’t want to put ads in front of them,” Brill says.
Much work focuses on ensuring that new bidding mechanisms don’t have incentives for advertisers to misrepresent click-through rates to get better ad placement. In the decentralized economy of the Internet, truthfulness is a currency reinforced by carefully crafted algorithms. “People are out there to make money,” says Thore Graepel, a senior researcher at Microsoft Research. “We need to build mechanisms where everyone benefits.”
news
One might expect the speed and volume of data to create a capacity problem, but the researchers express mixed opinions. Graepel says semantic analysis creates an extra burden. “You will hit a computational bottleneck, that’s pretty clear,” he says. To avoid this, researchers optimize algorithms to make the best decisions with the smallest possible data sets. But they also have faith in engineers’ ability to exploit techniques such as parallel processing. “It’s surprising how they are always able to scale to deal with these new algorithms,” Varian says.
Privacy regulations remain an obstacle to personalizing ads, says Graepel. The existing opt-in, opt-out model lets users choose to reveal personal data in exchange for discounts and other incentives. Researchers are also investigating aggregating data on Web traffic to more accurately match ad categories with coarsely defined groups of users who identify their interests simply by visiting certain types of Web sites.
Fortunately, there is hope for avoiding embarrassments like the ill-placed Chevy ad. Researchers at Microsoft adCenter Labs claim their sub-docu-ment classification methods can prevent incompatible ads and Web sites from ever hooking up. You might call it a reverse matchmaker, just the sort of odd little entity the Internet’s inventors might never have imagined.
David Essex is a freelance science writer based in peterborough, nh.
© 2009 aCm 0001-0782/09/0500 $5.00
enrollment in computer science classes in the United states has increased for the first time in six years, according to the Computing research association’s (Cra’s) annual Taulbee survey.
Total enrollment by majors and pre-majors in computer science is up 6.2% per department over last year. If only majors are considered, the increase is 8.1%, according to the Cra survey, which collected enrollment data in fall 2008 from computer
science and computer engineering departments at 192 ph.d.-granting universities.
“The upward surge of student interest is real and bigger than anyone expected,” says peter lee, incoming chair of Cra. “The fact that computer science graduates usually find themselves in high-paying jobs accounts for part of the reversal. Increasingly students also are attracted to the intellectual depth and societal benefits of computing technology.”
Computer science graduates on average earn 13% more than the average college graduate, according to the U.s. department of labor, and future job prospects for computer science graduates are higher than for any other science or engineering field.
The average number of new students per department majoring in computer science is up 9.5% over last year. Computer science departments are replenishing the freshman and sophomore ranks with larger
groups than they are graduating as seniors, and computer science graduation rates should increase in two to four years as these new students graduate.
The total number of ph.d. graduates among responding departments grew to 1,877 for the period July 2007 to June 2008, a 5.7% increase over the previous year.
one area that didn’t show improvement is the number of women pursuing computer science degrees, which held steady at 11.8%.
References:
Archives