Q: In 1969, Marvin Minsky and Seymour Papert published a book showing that Perceptrons could not recognize important patterns. 5 What effect
did that have on the field?
The entire field faltered following publication of the Minsky-Papert
book. They proved that single-layered
perceptrons would only work as classifiers when the data was “linearly separable”—meaning that the multidimensional data space could be separated
into regions bounded by hyperplanes.
Data not clustered in this way could
not be recognized. This finding caused
interest in perceptrons to die off, until
researchers discovered that adding layers and feedback to the neural network
overcame the problem. 1–3 The multilayered perceptron (MLP) can reliably
classify patterns of data in nonlinear
spaces. More layers mean more accuracy and, of course, more computation.
The modern term “deep learning” acknowledges the many-layers depth of a
An open problem for research today is finding the smallest number
of neurons and layers to implement a
for example, take the bitmap from a
camera shown the digit “ 9” and output
the ASCII code for “ 9”. A recommender
network maps a pattern into a string
representing an action the human
might decide to take next. Netflix has
developed a neural network that takes
your entire film viewing history along
with all the ratings you and others gave
those items and returns a recommendation of a new film that you would probably rate highly.
Q: How do you program a neural
You don’t. You teach it to learn how
to do the function you want. Suppose
you want to teach a network a function F that maps X patterns into Y patterns. You gather a large number of
samples (X, Y), called the training set.
For each sample you use a training
algorithm to adjust the connection
weights inside the network so that the
network outputs Y when given X at its
input. There are so many neurons,
connections, and possible weights
that the training algorithm can suc-
cessfully embed a large number of
pairs (X,Y) into the network. It takes
an enormous amount of computa-
tion to train a large network on a large
training set. We now have the hard-
ware power and training algorithms
to do this. The trained network will
implement all the trained (X,Y) com-
binations very reliably.
Once the network is trained, it can
compute its outputs very rapidly. It
has a fixed function until its training
We want to use our networks to
compute not only trained maps, but
untrained ones as well. That means
to compute Y=F(X) for a new pattern
X not in the training set. An important question is how much trust can
be put in the responses to untrained
If we keep track of the new (
untrained) inputs and their correct outputs, we can run the training algorithm
again with the additional training
pairs. The process of training is called
learning, and of retraining reinforcement learning.
The network does not learn on its
own. It depends on the training algorithm, which is in effect an automatic programmer.