resources/ACM2010 for details).
Supervised learning in higher areas. After this initial developmental
stage, learning a new object category
requires training only of task-specific circuits at the top of the ventral-stream hierarchy, thus providing a
position and scale-invariant representation to task-specific circuits beyond
IT to learn to generalize over transformations other than image-plane
transformations (such as 3D rotation)
that must be learned anew for each
object or category. For instance, pose-invariant face categorization circuits
may be built, possibly in PFC, by combining several units tuned to different
face examples, including different
people, views, and lighting conditions
(possibly in IT).
A default routine may be running in
a default state (no specific visual task),
perhaps the routine What is there?
As an example of a simple routine consider a classifier that receives the activity of a few hundred IT-like units, tuned
to examples of the target object and
distractors. While learning in the model from the layers below is stimulus-driven, the PFC-like classification units
are trained in a supervised way following a perceptron-like learning rule.
and high-level processes take place.
Agreement with experimental data.
Since its original development in the
late 1990s, 24, 29 the model in Figure 2
has been able to explain a number of
new experimental results, including
data not used to derive or fit model parameters. The model seems to be qualitatively and quantitatively consistent
with (and in some cases predicts29)
several properties of subpopulations
of cells in V1, V4, IT, and PFC, as well
as fMRI and psychophysical data (see
the sidebar “Quantitative Data Compatible with the Model” for a complete
list of findings).
We compared the performance of
the model against the performance
of human observers in a rapid animal
vs. non-animal recognition task28 for
which recognition is quick and cortical
back-projections may be less relevant.
Results indicate the model predicts
human performance quite well during
such a task, suggesting the model may
indeed provide a satisfactory descrip-
tion of the feedforward path. In par-
ticular, for this experiment, we broke
down the performance of the model
and human observers into four image
categories with varying amounts of
clutter. Interestingly, the performance
of both the model and the human ob-
servers was most accurate (~90% cor-
rect for both human participants and
the model) on images for which the
amount of information is maximal and
clutter minimal and decreases monoti-
cally as the clutter in the image increas-
es. This decrease in performance with
increasing clutter likely reflects a key
limitation of this type of feedforward
architecture. This result is in agree-
ment with the reduced selectivity of
neurons in V4 and IT when presented
with multiple stimuli within their re-
ceptive fields for which the model pro-
vides a good quantitative fit29 with neu-
rophysiology data (see the sidebar).
figure 2. hierarchical feedforward models of the visual cortex.
of the object?
How big is
The role of the anatomical back-projections present (in abundance) among
almost all areas in the visual cortex is
a matter of debate. A commonly accepted hypothesis is that the basic processing of information is feedforward, 30
supported most directly by the short
times required for a selective response
to appear in cells at all stages of the hierarchy. Neural recordings from IT in
a monkey12 show the activity of small
neuronal populations over very short
time intervals (as short as 12.5ms and
about 100ms after stimulus onset) contains surprisingly accurate and robust
information supporting a variety of
recognition tasks. While this data does
not rule out local feedback loops within
an area, it does suggest that a core hierarchical feedforward architecture
(like the one described here) may be a
reasonable starting point for a theory of
the visual cortex, aiming to explain immediate recognition, the initial phase
of recognition before eye movement