and should be rewritten, but it does include the basic analysis.
The decision to perform open refereeing was personal and, until now, I
have always refrained from proselytizing. Seeing the degradation in refereeing, however, I believe such reserve is
no longer appropriate. Establishing
open refereeing as the default strategy
is the first step toward fixing the flawed
culture of computer science refereeing.
Greg Linden
“massive-scale Data
mining for education”
http://cacm.acm.org/
blogs/blog-cacm/101489
November 10, 2010
Let’s say, in the near future, tens of
millions of students start learning
math using online computer software.
Our logs fill with a massive new data
stream, millions of students doing billions of exercises, as the students work.
In these logs, we will see some students struggle with some problems,
then overcome them. Others will struggle with those same problems and fail.
There will be paths of learning in the
data, some of which quickly reach mastery, others of which go off in the weeds.
At Amazon.com a decade ago, we
studied the trails people made as
they moved through our Web site. We
looked at the probability that people
would click on links to go from one
page to another. We watched the trails
people took through our site and where
they went astray. As people shopped,
we learned how to make shopping easier for others in the future.
GReG LiNDeN
“Let’s say we have
massive new logs of
what these students
are doing and how
well they are doing.
What would a big
internet company
do with this data?”
GReG LiNDeN
“teachers might
think one concept
should always
be taught before
another, but what
if the data shows
us different? What
if we reorder the
problems and
students learn
faster?”
Similarly, Google and Microsoft
learn from people using Web search.
When people find what they want,
Google notices. When other people
do that same search later, Google has
learned from earlier searchers, and
makes it easier for the new searchers to
get where they want to go.
Beyond a single search, the search
giants watch what people look for over
time as they do many searches—what
they eventually find or whether they
find nothing, where they navigate to
after searching—and learn to push future searchers onto the more successful paths trod by those before them.
So, let’s say we have millions of
students learning math on computers. Let’s say we have massive new
logs of what these students are doing
and how well they are doing. What
would a big Internet company do with
this data? What would be the Goog-ley thing to do with these logs? What
would massive-scale data mining look
like for students?
We could learn that students who
have difficulty solving one problem
would have trouble with another. For
example, perhaps students who have
difficulty with the problem (3x – 7 = 3)
have difficulty with (2x –13 = 5).
We could then learn of clusters
of problems that will be difficult for
someone to solve if they have the same
misunderstanding of an underlying
concept. For example, perhaps many
students who cannot solve (3x – 7 = 3)
and similar problems are confused
about how to move the – 7 to the other
side of the equation.
Also, we could discover the problems in that cluster that are particularly likely to teach that concept well,
to break students out of the misunderstanding and then be able to solve all
the problems they previously found
so difficult. For example, perhaps students who have difficulty with (3x – 7
= 3) and similar problems are usually
able to solve that problem when presented first with the easier problems (x
– 5 = 0) and (2x – 3 = 1).
Then we could learn paths through
clusters of problems that are particularly effective and rapid for students.
Teachers might think one concept
should always be taught before another, but what if the data shows us different? What if we reorder the problems
and students learn faster?
We could even learn personalized
and individualized paths for effective
and rapid learning. Some students
might start on a generic path, show
early mastery, and jump ahead. Others
might struggle with one type of problem or another. Each time a student
struggles, we will try them on problems
that might be a path for them to learn
the underlying concepts and succeed.
We will know these paths because so
many others struggled before, some of
which found success.
As we experiment, as millions of students try different exercises, we forget
the paths that consistently led to continued struggles, remember the ones
that lead to rapid mastery, and, as new
students come in, we put them on the
successful paths we have seen before.
It would be student modeling on
a heretofore unseen scale. From tens
of millions of students, we automatically learn tens of thousands of models, little trails of success for future
students to follow. We experiment, try
different students on different problems, discover which exercises cause
similar difficulties, and which exercises help students break out of those
difficulties. We learn paths in the data
and models of the students. We learn
to teach.
Bertrand Meyer is a professor at eth Zurich.
Greg Linden is the founder of geeky Ventures.