will be needed to tell the difference between, for instance, a spam email message offering you a free cruise and an email message from your parents telling you about their Alaskan cruise.

“By allowing the algorithm to interactively choose the ‘most informative’ samples to be labeled, the algorithm will be able to produce a much more accurate model using fewer labeled samples, meaning that we have to do a lot less tedious labeling by hand to get accurate prediction models,” says Jenn Wortman, a Ph.D. student at the University of Pennsylvania. “It is possible to prove that in some special cases, active learning algorithms can produce models with the same accuracy as models built by passive learning algorithms while requiring exponentially fewer labeled examples.”

 

information overload Of course, deciding what makes an example informative or not is a huge

challenge. Little progress had been made until 2005 when Sanjoy Dasgupta, a professor of computer science at the University of California, San Diego, published a paper, “Coarse Sample Complexity Bounds for Active Learning,” which quantified how many labeled examples are needed to find a pattern with active learning over passive learning. The result was significantly less with active learning, but some basic assumptions were needed to see such dramatic outcomes.

One of the most problematic assumptions was that there could be no mislabeled examples causing noise, which would be unreasonable in real-world applications. The following year, however, Maria-Florina Balcan, Alina Beygelzimer, and John Langford published a paper, “Agnostic Active Learning,” which described the first active learning algorithm that could work with noise but still provided improvement over passive learning. The

algorithm, A2, labeled examples from a region of disagreement at random, eliminating certain hypotheses until a pattern emerged that was as close as possible to a true understanding.

One of the major improvements of A2 was finding a threshold function. With passive learning, every piece of data must be analyzed between two bounds before the threshold can be reliably established. With active learning, however, the algorithm can narrow down the threshold value in an almost binary search, resulting in exponentially fewer labels.

Additional research by Hanneke demonstrated that the algorithm performed well as long as the samples in the region of disagreement weren’t too diverse and didn’t differ in too many ways. Also, further refinements addressed the sampling bias created by labeling only the most informative examples. In essence, the labeled examples were not being chosen at ran-

Career
Computer Science Awards

the national academy of
engineers was among several
professional societies that
recently honored a select
group of researchers for their
distinguished contributions to the
field of computer science.

2009 eAtcS AWARD the european association for theoretical computer Science (eatcS) honored Gérard huet, director of research at the french national institute for research in computer Science and control, with the 2009 eatcS award, in recognition of his distinguished career in theoretical computer science.

tSutomu KAnAi AWARD

Willy Zwaenepoel, a professor

of computer science and director of the computer Systems laboratory at rice university, won the ieee computer

Society tsutomu Kanai award for major contributions to state-of-the-art distributed computing

systems and their applications.

nAe memBeRS

the national academy of engineering elected 65 new members and nine foreign associates for outstanding contributions to “engineering research, practice, or education, including, where appropriate, significant contributions to the engineering literature,” and to the “pioneering of new and developing fields of technology, making major advancements in traditional fields of engineering, or developing/implementing innovative approaches to engineering education.”

fourteen computer scientists are among the new members and foreign associates. they are:

˲ Paul m. anderson, consultant, Power math associates;

˲ Sergey brin, co-founder and president of technology, Google; ˲ William J. dally, William r. and inez Kerr bell Professor of computer Science, Stanford university;

˲ Jeffrey dean, Google fellow, Google;

˲ Jack b. dennis, professor emeritus, computer Science and

artificial intelligence labora-
tory, massachusetts institute of
technology;
˲ deborah l. estrin, director,
center for embedded networked
Sensing, university of california,
los angeles;
˲ Sanjay Ghemawat, Google fel-
low, Google;
˲ Paul c. Kocher, founder,
president, and chief scientist,
cryptography research inc.;
˲ c. mohan, ibm fellow, ibm
almaden research center;
˲ mendel rosenblum, associate
professor of computer science
and of electrical engineering,
Stanford university;
˲ Gurindar S. Sohi, John P.
morgridge Professor and e. david
cronon Professor of computer
Sciences, departments of com-
puter sciences and electrical and
computer engineering, university
of Wisconsin, madison;
˲ John a. Swanson, president,
Swanson analysis Services inc.;
˲ William l. “red” Whittaker,
fredkin Professor of robotics,
the robotics institute, carnegie
mellon university;
˲ Peter t. Kirstein, professor,
department of computer science,
university college london.

SLoAn ReSeARch feLLo WS

the alfred P. Sloan foundation awarded two-year fellowships to 118 researchers, including 16 computer scientists, “in recognition of distinguished performance and a unique potential to make substantial contributions to their field.”

the computer scientists are: Scott aaronson, mit; luis von ahn, carnegie mellon university; Shuchi chawla, university of Wisconsin, madison; Kevin fu, u. of massachusetts–amherst; odest chadwicke Jenkins, brown university; david Kempe, university of Southern california; James russell lee, university of Washington; Zhuoqing morley mao, university of michigan; Kamesh munagala, duke university; tze Sing eugene ng, rice university; ryan William o’donnell, carnegie mellon university; fabio Pellacini, dartmouth college; ramesh raskar, mit; alex c. Snoeren, university of california, San diego; rené Vidal, Johns hopkins university; and Steve Zdancewic, university of Pennsylvania.

12 communicAtionS of the Acm | APriL 2009 | voL. 52 | no. 4

References:

Archives