ogy of the primate visual system leads
to good performance with respect to
computer-vision benchmarks may
suggest neuroscience is on the verge
of providing novel and useful paradigms to computer vision and perhaps to other areas of computer science as well. The feedforward model
described here can be modified and
improved by taking into account new
experimental data (such as more detailed properties of specific visual
areas like V125), implementing some
of its implicit assumptions (such as
learning invariances from sequences
of natural images), taking into account additional sources of visual information (such as binocular disparity
and color), and extention to describe
the detailed dynamics of neural responses. Meanwhile, the recognition
performance of models of this general
type can be improved by exploring parameters (such as receptive field sizes
and connectivity) by, say, using com-puter-intensive iterations of a muta-tion-and-test cycle.
However, it is important to realize
the intrinsic limitations of the specific
computational framework we have
described and why it is at best a first
step toward understanding the visual
cortex. First, from the anatomical and
physiological point of view the class of
feedforward models we’ve described
here is incomplete, as it does not account for the massive back-projections
found in the cortex. To date, the role
of cortical feedback remains poorly
understood. It is likely that feedback
underlies top-down signals related to
attention, task-dependent biases, and
memory. Back-projections must also
be taken into account in order to describe visual perception beyond the
Given enough time, humans use
eye movement to scan images, and
performance in many object-recog-
nition tasks improves significantly
over that obtained during quick pre-
sentations. Extensions of the model
to incorporate feedback are possible
and under way. 2 Feedforward models
may well turn out to be approximate
descriptions of the first 100msec–
200msec of the processing required by
more complex theories of vision based
on back-projections. 3, 5, 7, 8, 14, 22, 31 How-
ever, the computations involved in
the initial phase are nontrivial but es-
sential for any scheme involving feed-
back. A related point is that normal
visual perception is much more than
classification, as it involves interpret-
ing and parsing visual scenes. In this
sense, the class of models we describe
is limited, since it deals only with clas-
sification tasks. More complex archi-
tectures are needed; see Serre et al. 26
for a discussion.
We thank Jake Bouvrie for his useful
feedback on the manuscript, as well
as the referees for their valuable comments.
1. bengio, J. and Le cun, y. scaling learning algorithms
towards ai. in Large-Scale Kernel Machines, L.
bottou, o. chapelle, d. decoste, and J. weston, J.,
eds. Mit Press, cambridge, Ma, 2007, 321–360.
2. chikkerur, s., serre, t., tan, c., and Poggio, t. what
and where: A Bayesian Inference Theory of Attention
(in press). Vision research, 2010.
3. dean, t. a computational model of the cerebral
cortex. in Proceedings of the 20th National
Conference on Artificial Intelligence (Pittsburgh, Pa,
July 9–13, 2005), 938–943.
4. dicarlo, J.J. and cox, d.d. untangling invariant object
recognition. Trends in Cognitive Science 11, 8 (aug.
5. epshtein, b., Lifshitz, i., and ullman, s. image
interpretation by a single bottom-up top-down cycle.
Proceedings of the National Academy of Sciences 105,
38 (sept. 2008), 14298–14303.
6. Fukushima, K. neocognitron: a self-organizing
neural network model for a mechanism of pattern
recognition unaffected by shift in position. Biological
Cybernetics 36, 4 (apr. 1980), 193–202.
7. george, d. and hawkins, J. a hierarchical bayesian
model of invariant pattern recognition in the visual
cortex. in Proceedings of the International Joint
Conference on Neural Networks 3, (Montréal, July
31–aug. 4). ieee Press, 2005, 1812–1817.
8. grossberg, s. towards a unified theory of neocortex:
Laminar cortical circuits for vision and cognition.
Progress in Brain Research 165 (2007), 79–104.
9. hegdé, h. and Felleman, d.J. reappraising the
functional implications of the primate visual
anatomical hierarchy. The Neuroscientist 13, 5 (2007),
10. heisele, b., serre, t., and Poggio, t. a component-based framework for face detection and identification.
International Journal of Computer Vision 74, 2 (Jan.
1, 2007), 167–181.
11. hinton, g.e. Learning multiple layers of
representation. Trends in Cognitive Sciences 11, 10
(oct. 2007), 428–434.
12. hung, c.P., Kreiman, g., Poggio, t., and dicarlo, J.J.
Fast read-out of object identity from macaque inferior
temporal cortex. Science 310, 5749 (nov. 4, 2005),
Thomas Serre ( firstname.lastname@example.org) is an
assistant professor in the department of cognitive,
Linguistic & Psychological sciences at brown university,
Tomaso Poggio ( email@example.com) is the eugene
Mcdermott Professor in the department of brain and
cognitive sciences in the Mcgovern institute for brain
research at the Massachusetts institute of technology,