of us, skeptics and believers, went
back to our laboratories to explore
these questions. Within a year or two,
the evidence was quite clear. For example, the R-CNN work of Girshick
et al.
3 showed the KSH architecture
could be modified, by making use of
computer vision ideas such as region
proposals, to make possible state of
the art object detection on PASCAL
VOC. Getting SGD to work well is an
art, but it could be mastered by students and researchers and corporate
employees and yield reproducible
results in many different settings.
We do not yet have convincing theoretical proof of the robustness of SGD
but the empirical evidence is quite
compelling, so we leave it to the theoreticians to find an explanation while
experimentalists forge ahead. We
have realized that generally deeper
networks work better, and that overfitting fears are overblown. We have
new techniques such as “batch normalization” to deal with regularization, and dropout is not so crucial any
more. Practical applications abound.
It is my opinion the following paper is the most impactful paper in machine learning and computer vision in
the last five years. It is the paper that
led the field of computer vision to embrace deep learning.
References
1. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K. and Li,
F.-F. ImageNet: A Large- scale hierarchical image
database. In Proceedings of the IEEE Computer
Vision and Pattern Recognition, (June 20–25, 2009).
2. Fukushima, K. Neocognitron: A self-organizing
neural network model for a mechanism of pattern
recognition unaffected by shift in position. Biol
Cybern 34, 4 (1980), 193–202.
3. Girshick, R., Donahue, J., Darrell, T. and Malik,
J. Rich feature hierarchies for accurate object
detection and semantic segmentation. In
Proceedings of the IEEE Computer Vision and
Pattern Recognition, (2014).
4. Hubel, D. H. and Wiesel, T.N. Receptive fields,
binocular interactions and functional architecture
in the cat’s visual cortex. J. Physiology 160, 1 (Jan.
1962), 106–154.
5. Hubel, D. H. and Wiesel, T.N. Receptive fields and
functional architecture of monkey striate cortex. J.
Physiology 195, 1 (Mar. 1968), 215–243.
6. LeCun, Y. et al. Backpropagation applied to
handwritten zip code recognition. Neural
Computation 1 (1989), 541–551.
7. Rumelhart, D.E., Hinton G.E, and Williams R. J.
Learning representations by back-propagating
errors. Nature 323 (Oct. 9, 1986), 533–536.
8. Werbos P. Beyond regression: New tools for
prediction and analysis in the behavioral sciences.
Ph. D. thesis, Harvard University, 1974.
Jitendra Malik is the Arthur J. Chick Professor of
EECS at the University of California at Berkeley.
Copyright held by author.