design. How can we best construct
reward functions operating on different time scales? What is the relationship between the realizable and
agnostic versions of this setting, and
how can we construct an algorithm
that smoothly interpolates between
2. How can we learn from lots of
data? We will be presenting a KDD
survey/tutorial about what is been
done. Some of the larger-scale learning problems have been addressed effectively using MapReduce. The best
example I know is Ozgur Cetin’s algorithm at Yahoo! It is preconditioned
conjugate gradient with a Newton
stepsize using two passes over examples per step. (A nonHadoop version
is implemented in Vowpal Wabbit
for reference.) But linear predictors
are not enough; we would like learning algorithms that can, for example,
learn from all the images in the world.
Doing this well plausibly requires a
new approach and new learning algorithms. A key observation here is that
the bandwidth required by the learning algorithm cannot be too great.
3. How can we learn to index efficiently? The standard solution in information retrieval is to evaluate (or
approximately evaluate) all objects
in a database returning the elements
with the largest score according to
some learned or constructed scoring
function. This is an inherently O(n)
operation, which is frustrating when
it’s plausible that an exponentially
faster O(log(n)) solution exists. A good
solution involves both theory and empirical work here as we need to think
about how to think about how to solve
the problem, and of course we need to
4. What is a flexible, inherently efficient language for architecting representations for learning algorithms?
Right now, graphical models often
get (mis)used for this purpose. It is
easy and natural to pose a computationally intractable graphical model,
implying many real applications involve approximations. A better solution would be to use a different representation language that was always
computationally tractable yet flexible
enough to solve real-world problems.
One starting point for this is Searn.
Another general approach was the
topic of the Coarse-to-Fine Learning
and Inference Workshop. These are
inherently related as coarse-to-fine is
a pruned breadth first search. Restated, it is not enough to have a language
for specifying your prior structural beliefs; instead we must have a language
that results in computationally tractable solutions.
5. The deep learning problem remains interesting. How do you effectively learn complex nonlinearities
capable of better performance than
a basic linear predictor? An effective
solution avoids feature engineering.
Right now, this is almost entirely dealt
with empirically, but theory could easily have a role to play in phrasing appropriate optimization algorithms,
Good solutions to each of these research directions would result in revolutions in their area, and every one of
them would plausibly see wide applicability.
“Research in Agile
June 20, 2011
I am an enthusiastic advocate of agile
software development practices like
Scrum. Its ability to allow teams to
focus on delivering product and communicate status has made it one of the
easiest and best software development
techniques I have seen in a career that
has used ad hoc, Waterfall, and everything in between.
Recent research from New Zealand
has furthered the cause by performing
a study that involved 58 practitioners
in 23 organizations over four years. In
reading a Victoria University of Wellington article on “Smarter Software
Development” and then looking at
Rashina Hoda’s thesis “
Self-Organizing Agile Teams: A Grounded Theory,”
there are two interesting takeaways:
1. Self-organizing scrum teams naturally perform a balancing act between:
˲ Freedom and responsibility: The
team is responsible for collective decision making, assignment, commitment, and measurement, and must
choose to do them.
˲ Cross functionality and specialization: The team is responsible for deciding when to distribute work across
team members or have each focus on a
certain part of the project.
˲Continuous learning and iteration pressure: The team is responsible for delivering on its own schedule and the retrospectives to improve
The advantage of giving this balancing act to the team is that it take
ownership of the solution with the
full understanding of all the trade-offs
that will need to occur each sprint. By
distributing the work to the team, it
also makes team members accountable to one another for making sure
the goals are achieved.
2. Self-organizing teams have their
members assume some well-defined
roles spontaneously, informally, and
transiently to help make their projects
˲ Mentor: Guides the team in the use
of agile methods.
˲Coordinator: Manages customer
expectations and collaboration with
˲Translator: Translates customer
business requirements to technical requirements and back.
˲ Champion: Advocates agile team
approach with senior management.
˲ Promoter: Works with customers
to explain agile development and how
to collaborate best with the team.
˲ Terminator: Removes team members that hinders the team’s successful
These roles are an emergent property that comes from using agile development methods. They are not prescribed explicitly as part of any of the
agile development philosophies, but
they arise as part of successful use of
I am eager to see more research
emerge as to where agile software
development practices succeed and
where they need improvement. There
is a large body of evidence that shows it
to be a successful strategy, and having
the research to support it would help
encourage its adoption.
John Langford is a senior researcher at microsoft
research new york. Ruben Ortega is an engineering
director at google.