produce ever-improving modifications to itself in principle. A second, related issue is whether its learning
mechanisms will make these potential changes, converging in practice given a tractable amount of computation and training experience.
We thank the anonymous reviewers for their constructive
comments, and thank Lucas Navarro, Bill McDowell, Oscar
Romero and Amos Azaria for help in the empirical evaluation of NELL. This research was supported in part by DARPA
under contract number FA8750-13-2-0005, by NSF grants
IIS-1065251 and CCF-1116892, by AFOSR award FA9550-
17-1-0218, and by several generous gifts from Google and
Yahoo!. We also gratefully acknowledge graduate fellowship support over the years from Yahoo, Microsoft, Google,
Fulbright, CAPES, and FAPESP.
components; for example, it currently lacks methods
for representing and reasoning about time and space.
Hence, core AI problems of representation and tractable reasoning are also core research problems for
never-ending learning agents. In addition, recent
research in natural language has shown that working wth non-symbolic vector embeddings of words,
phrases and entities, learned via deep neural networks,
has many advantages. In NELL, the recent addition of
the LE method has similarly yielded improvements in
NELL’s ability to extract new instances of categories
and relations. However, an even more dramatic adoption of vector embeddings learned via deep networks
would be possible, for example, providing a continuous space of category and relation predicates represented by vectors and matrices, fundamentally
changing the framing of the ontology extension problem (i.e., if every relation is represented by a matrix,
the set of possible matrices is the set of possible relations in the ontology).
The study of never-ending learning raises important
conceptual and theoretical problems as well, including:
• The relationship between consistency and correctness.
An autonomous learning agent can never truly perceive
whether it is correct—it can at best detect only that it is
internally consistent. For example, even if it observes
that its predictions (e.g., new beliefs predicted by
NELL’s learned Horn clauses) are consistent with what
it perceives (e.g., what NELL reads from text), it cannot
distinguish whether that observed consistency is due to
correct predictions and correct perceptions, or incorrect
predictions and correspondingly incorrect perceptions.
This is important in understanding never-ending learn-
ing, because it suggests organizing the learning agent
to become increasingly consistent over time, which is
precisely how NELL uses its consistency constraints to
guide learning. A key open theoretical question there-
fore is “under what conditions can one guarantee that
an increasingly consistent learning agent is also an
increasingly correct agent?” Platanios et al. 32 provides
one step in this direction, by providing an approach
that will soon allow NELL to estimate its accuracy based
on the observed consistency rate among its learned
functions, but much remains to be understood about
this fundamental theoretical question.
• Convergence guarantees in principle and in practice. A
second fundamental question for never-ending learning agents is “what agent architecture is sufficient to
guarantee that the agent can in principle generate a
sequence of self-modifications that will transform it
from its initial state to an increasingly high performance agent, without hitting performance plateaus?”
Note this may require that the architecture support pervasive plasticity, the ability to change its representations, etcetera. One issue here is whether the architecture
has sufficient self-modification operations to allow it to
1. Balcan, M.-F., Blum, A. A PAC-style
model for learning from labeled
and unlabeled data. Proc. of COLT
2. Bengio, Y. Learning deep architectures
for AI. Foundations and Trends in
Machine Learning 2, 1 (2009), 1–127.
3. Bengio, Y., Louradour, J.,
Collobert, R., Weston, J. Curriculum
learning. In Proceedings of the 26th
annual international conference
on machine learning (2009), ACM,
4. Blum, A., Mitchell, T. Combining
labeled and unlabeled data with co-training. Proc. of COLT (1998).
5. Brunskill, E., Leffler, B., Li, L.,
Littman, M. L., Roy, N. Corl: A
reinforcement learner. In Proceedings
of the 24th Conference on Uncertainty
in Artificial Intelligence (UAI) (2012),
6. Callan, J. Clueweb12 data set (2013)
7. Callan, J., Hoy, M. Clueweb09 data
set (2009) http://boston.lti.cs.cmu.
8. Carlson, A., Betteridge, J., Kisiel, B.,
Settles, B., Hruschka Jr, E.R.,
Mitchell, T.M. Toward an architecture
for never-ending language learning.
AAAI 5, 3 (2010a).
9. Carlson, A., Betteridge, J., Wang, R. C.,
Hruschka Jr., E.R., Mitchell, T.M.
Coupled semi-supervised learning for
information extraction. Proc. of WSDM
10. Caruana, R. Multitask learning.
Machine Learning 28 (1997), 41–75.
11. Chen, Z., Liu, B. Lifelong machine
learning. Synthesis Lectures on
Artificial Intelligence and Machine
Learning 10, 3 (2016), 1–145.
12. Chen, X., Shrivastava, A., Gupta, A.
Neil: Extracting visual knowledge
from web data. In Proceedings of
13. Craven, M., DiPasquo, D., Freitag, D.,
McCallum, A., Mitchell, T., Nigam, K.,
Slattery, S. Learning to extract
symbolic knowledge from the world
wide web. In Proceedings of the 15th
National Conference on Artificial
14. Dempster, A., Laird, N., Rubin, D.
Maximum likelihood from incomplete
data via the EM algorithm. J. R. Stat.
Soc. Series B (1977).
15. Donmez, P., Carbonell, J.G. Proactive
learning: cost-sensitive active
learning with multiple imperfect
oracles. In Proceedings of the 17th
ACM conference on Information and
knowledge management (2008),
16. Duarte, M.C., Hruschka Jr., E.R. How
to read the web in portuguese using
the never-ending language learner’s
principles. In Intelligent Systems
Design and Applications (ISDA), 2014
14th International Conference on
(2014), IEEE, 162–167.
17. Etzioni, O.e.a. Web-scale information
extraction in knowitall (preliminary
results). In W WW (2004).
18. Etzioni, O.e.a. Open information
extraction: The second generation.
Proc. of IJCAI (2011).
19. Gardner, M., Talukdar, P.,
Krishnamurthy, J., Mitchell, T.
Incorporating vector space similarity
in random walk inference over
knowledge bases. Proc. of EMNLP
20. Krishnamurthy, J., Mitchell, T.M.
Which noun phrases denote
which concepts. Proc. of ACL
21. Laird, J., Newell, A., Rosenbloom, P.
SOAR: An architecture for general
intelligence. Artif. Intel. 33, (1987),
22. Langley, P., McKusick, K.B., Allen, J.A.,
Iba, W. F., Thompson, K. A design for
the ICARUS architecture. SIGART
Bull. 2, 4 (1991), 104–109.
23. Lao, N., Mitchell, T., Cohen, W. W.
Random walk inference and learning
in a large scale knowledge base. Proc.
of EMNLP (2011).
24. Lenat, D. B. Eurisko: A program that
learns new heuristics and domain
concepts. Artif. Intel. 21, 1–2 (1983),
25. Maaten, L.v.d., Hinton, G. Visualizing
data using t-SNE. J. Machine Learning
Res. 9, Nov (2008):2579–2605.
26. Mitchell, T.M., Allen, J., Chalasani, P.,
Cheng, J., Etzioni, O., Ringuette, M. N.,
Schlimmer, J. C. THEO: A framework
for self-improving systems. Arch. for
Intel. (1991), 323–356.
27. Mitchell, T., Cohen, W., Hruschka, E.,
Talukdar, P., Betteridge, J., Carlson, A.,
Dalvi, B., Gardner, M., Kisiel, B.,
Krishnamurthy, J., Lao, N., Mazaitis, K.,
Mohamed, T., Nakashole, N.,