exploration predicated on intuitions
that have yet to coalesce into crisp
formal representations. Speculation
is a way for authors to impart intuitions that may not yet withstand
the full weight of scientific scrutiny.
Papers often offer speculation in the
guise of explanations, however, which
are then interpreted as authoritative
because of the trappings of a scientific paper and the presumed expertise
of the authors.
For instance, in a 2015 paper, Ioffe
and Szegedy18 form an intuitive theory
around a concept called internal co-
variate shift. The exposition on inter-
nal covariate shift, starting from the
abstract, appears to state technical
facts. Key terms are not made crisp
enough, however, to assume a truth
value conclusively. For example, the
paper states that batch normaliza-
tion offers improvements by reducing
changes in the distribution of hidden
activations over the course of train-
ing. By which divergence measure is
this change quantified? The paper
never clarifies, and some work sug-
gests that this explanation of batch
normalization may be off the mark.
37
Nevertheless, the speculative expla-
nation given by Ioffe and Szegedy has
been repeated as fact—for example, in
a 2015 paper by Noh, Hong, and Han,
31
which states, “It is well known that a
deep neural network is very hard to
optimize due to the internal-covari-
ate-shift problem.”
We have been equally guilty of spec-
ulation disguised as explanation. In a
2017 paper with Koh and Liang,
42 I (Ja-
cob Steinhardt) wrote that “the high
dimensionality and abundance of ir-
relevant features … give the attacker
more room to construct attacks,”
without conducting any experiments
to measure the effect of dimensional-
ity on attackability. In another paper
with Liang from 2015,
41 I (Steinhardt)
introduced the intuitive notion of
coverage without defining it, and
used it as a form of explanation (for
example, “Recall that one symptom
of a lack of coverage is poor esti-
mates of uncertainty and the inabil-
ity to generate high-precision predic-
tions.” Looking back, we desired to
communicate insufficiently fleshed-
out intuitions that were material to
the work described in the paper and
on the following four patterns that ap-
pear to be trending in ML scholarship:
• Failure to distinguish between explanation and speculation.
• Failure to identify the sources of
empirical gains (for example, emphasizing unnecessary modifications to
neural architectures when gains actually stem from hyperparameter tuning).
•“Mathiness”—the use of mathematics that obfuscates or impresses
rather than clarifies (for example, by
confusing technical and nontechnical
concepts).
• Misuse of language (for example,
by choosing terms of art with colloquial connotations or by overloading
established technical terms).
While the causes of these patterns
are uncertain, possibilities include
the rapid expansion of the community, the consequent thinness of the reviewer pool, and the often-misaligned
incentives between scholarship and
short-term measures of success (for
example, bibliometrics, attention,
and entrepreneurial opportunity).
While each pattern offers a corresponding remedy (don’t do it), this
article also makes suggestions on how
the community might combat these
troubling trends.
As the impact of machine learning widens, and the audience for research papers increasingly includes
students, journalists, and policy-mak-ers, these considerations apply to this
wider audience as well. By communicating more precise information with
greater clarity, better ML scholarship
could accelerate the pace of research,
reduce the on-boarding time for new
researchers, and play a more constructive role in public discourse.
Flawed scholarship threatens to
mislead the public and stymie future
research by compromising ML’s in-
tellectual foundations. Indeed, many
of these problems have recurred cy-
clically throughout the history of AI
(artificial intelligence) and, more
broadly, in scientific research. I n
1976, Drew McDermott26 chastised
the AI community for abandoning
self-discipline, warning prophetically
“if we can’t criticize ourselves, some-
one else will save us the trouble.” Sim-
ilar discussions recurred throughout
the 1980s, 1990s, and 2000s. In other
fields, such as psychology, poor exper-
imental standards have eroded trust
in the discipline’s authority.
33 The
current strength of machine learn-
ing owes to a large body of rigorous
research to date, both theoretical and
empirical. By promoting clear scien-
tific thinking and communication,
our community can sustain the trust
and investment it currently enjoys.
Disclaimers. This article aims to instigate discussion, answering a call for
papers from the International Conference on Machine Learning (ICML)
Machine Learning Debates workshop.
While we stand by the points represented here, we do not purport to offer a full or balanced viewpoint or to
discuss the overall quality of science
in ML. In many aspects, such as reproducibility, the community has advanced standards far beyond what sufficed a decade ago.
Note that these arguments are made
by us, against us—insiders offering
a critical introspective look—not as
sniping outsiders. The ills identified
here are not specific to any individual
or institution. We have fallen into these
patterns ourselves, and likely will again
in the future. Exhibiting one of these
patterns doesn’t make a paper bad, nor
does it indict the paper’s authors; however, all papers could be made stronger
by avoiding these patterns.
While we provide concrete examples, our guiding principles are to
implicate ourselves; and to select preferentially from the work of better-es-tablished researchers and institutions
that we admire, to avoid singling out
junior students for whom inclusion
in this discussion might have consequences and who lack the opportunity
to reply symmetrically. We are grateful
to belong to a community that provides sufficient intellectual freedom
to allow the expression of critical perspectives.
Troubling Trends
Each subsection that follows describes a trend; provides several examples (as well as positive examples that
resist the trend); and explains the consequences. Pointing to weaknesses in
individual papers can be a sensitive
topic. To minimize this, the examples
are short and specific.
Explanation vs. speculation.
Research into new areas often involves