non-falsifiable due to our inability to
perform the requisite experiments in
the brain.
7, 40
Figure 2 illustrates one potential
path by which more sophisticated neural algorithms could emerge going forward. Because of advances in neuroscience, it is reasonable to expect the
current deep learning revolution could
be followed by complementary revolutions in other cognitive domains. Each
of these novel capabilities will build
on its predecessors—it is unlikely that
any of these algorithms will ever go
away, but it will continue to be used as
an input substrate for more sophisticated cognitive functions. Indeed, in
most of the following examples, there
is currently a deep learning approach
to achieving that capability (noted in
the second column). Importantly, this
description avoids an explicit judgment between the value of neural-inspired and neural-plausible representations; the neuroscience community
has long seen value in representing
cognitive function at different levels
of neural fidelity. Of course, for computing applications, due to distinct
goals from biological brains, it is likely
that some level of abstraction will often outperform algorithms relying on
mimicry; however, for theoretical development, it will likely be more effective for researchers to represent neural
computation in a more biologically
plausible fashion.
1. Feed-forward sensory processing.
The use of neural networks to computationally solve tasks associated with human vision and other sensory systems is
not a new technology. While there have
been a few key theoretical advances, at
their core deep learning networks, such
as deep convolutional networks, are not
fundamentally different from ideas being developed in the 1980s and 1990s.
As mentioned earlier, the success of
deep networks in the past decade has
been driven in large part due to the
availability of sufficiently rich datasets
as well as the recognition that modern
computing technology, such as GPUs,
are effective at training at large scale.
In many ways, the advances that have
enabled deep learning have simply allowed ANNs to realize the potential
that the connectionist cognitive science
community has been predicting for several decades.
From a neuroscience perspective,
deep learning’s success is both promising and limited. The pattern classification function that deep networks excel at is only a very narrow example of
cognitive functionality, albeit one that
is quite important. The inspiration it
takes from the brain is quite restricted
as well. Deep networks are arguably
inspired by neuroscience that dates to
the 1950s and 1960s, with the recognition by neuroscientists like Vernon
Mountcastle, David Hubel, and Tor-sten Wiesel that early sensory cortex
is modular, hierarchical, and has representations that start simple in early
layers and become progressively more
complex. While these are critical findings that have also helped frame cortical research for decades, the neuroscience community has built on these
findings in many ways that have yet to
be integrated into machine learning.
One such example, described here, is
the importance of time.
2. Temporal neural networks. We
appear to be at a transition point in the
neural algorithm community. Today,
much of the research around neural algorithms is focused on extending methods derived from deep learning to operate with temporal components. These
methods, including techniques such
as long short-term memory, are quickly
beginning to surpass state of the art
on more time-dependent tasks such
as audio processing.
23 Similar to more
conventional deep networks, many of
these time-based methods leverage relatively old ideas in the ANN community
around using network recurrence and
local feedback to represent time.
While this use of time arising from
local feedback is already proving pow-
erful, it is a limited implementation
of the temporal complexity within the
brain. Local circuits are incredibly
complex; often numerically dominat-
ing inputs and outputs to a region
and consisting of many distinct neu-
ron types.
17 The value of this local
complexity likely goes far beyond the
current recurrent ANN goals of main-
taining a local state for some period
of time. In addition to the richness of
local biological complexity, there is
the consideration of what spike based
information processing means with
regard to contributing information
about time. While there is significant
discussion around spike-based neu-
ral algorithms from the perspective
of energy efficiency; less frequently
noted is the ability of spiking neurons
to incorporate information in the time
domain. Neuroscience researchers are
very familiar with aspects of neural
processing for which “when” a spike
occurs can be as important as whether
a spike occurs at all, however this form
of information representation is un-
common in other domains.
Extracting more computational capabilities from spiking and the local
circuit complexity seen in cortex and
other regions has the potential to enable temporal neural networks to continue to become more powerful and
effective in the coming years. However, it is likely the full potential of
temporal neural networks will not be
fully realized until they are fully integrated into systems that also include
the complexity of regional communication in the brain, such as networks
configured to perform both top-down
and bottom-up processing simultaneously, such as neural-inspired Bayesian inference networks.
3. Bayesian neural algorithms. Even
perhaps more than the time, the most
common critique from neuroscientists
about the neural plausibility of deep
learning networks is the general lack
of “top-down” projections within these
algorithms. Aside from the optic nerve
projection from retina to the LGN area
of the thalamus, the classic visual processing circuit of the brain includes as
much, and often more, top-down connectivity between regions (for example, V2→V1) as it contains bottom-up
(V1→V2).
Not surprisingly, the observation
that higher-level information can influence how lower-level regions process information has strong ties to
well-established motifs of data processing based around Bayesian inference. Loosely speaking, these models
allow data to be interpreted not simply
by low-level information assembling
into higher features unidirectionally,
but also by what is expected—either
acutely based on context or historically
based on past experiences. In effect,
these high-level “priors” can bias low-level processing toward more accurate interpretations of what the input
means in a broader sense.