Unsupervised Learning of
Hierarchical Representations
with Convolutional Deep
Belief Networks
abstract
There has been much interest in unsupervised learning of
hierarchical generative models such as deep belief networks
(DBNs); however, scaling such models to full-sized, high-dimensional images remains a difficult problem. To address
this problem, we present the convolutional deep belief network, a hierarchical generative model that scales to realistic
image sizes. This model is translation-invariant and supports efficient bottom-up and top-down probabilistic inference. Key to our approach is probabilistic max-pooling, a
novel technique that shrinks the representations of higher
layers in a probabilistically sound way. our experiments
show that the algorithm learns useful high-level visual features, such as object parts, from unlabeled images of objects
and natural scenes. We demonstrate excellent performance
on several visual recognition tasks and show that our model
can perform hierarchical (bottom-up and top-down) inference over full-sized images.
1. intRoDuction
Machine learning has been highly successful in tackling
many real-world artificial intelligence and data mining problems, such as optical character recognition, face detection,
autonomous car driving, data mining of biological data, and
Web search/information retrieval. However, the success of
machine learning systems often requires a large amount of
labeled data (which is expensive to obtain) and significant
manual feature engineering. These feature representations
are often hand-designed, require significant amounts of
domain knowledge and human labor, and do not generalize
well to new domains. Therefore, it is desirable to be able to
develop feature representations automatically while using a
small amount of labeled data.
Given these issues, we consider the problem of learn-
ing feature representations from unlabeled data, which
we call unsupervised feature learning. Here, we are inter-
ested in primarily using unlabeled data because we can
easily obtain a virtually unlimited amount of unlabeled
data via the Internet. In fact, even though we do not have
labels, there often exist rich structures in unlabeled data.
For example, if we look at images of a specific object (e.g.,
a face), we can easily discover high-level structures such
as object parts (e.g., face parts). Given natural images, we
may be able to discover low-level structures such as edges,
as well as high-level structures such as corners, local cur-
vatures, and shapes. The main assumption of unsuper-
vised feature learning is that such structures in unlabeled
data can be useful in machine learning tasks. For example,
if the input data have structures generated from specific
object classes (e.g., cars vs. faces), then discovering class-
specific patterns (e.g., car wheels or face parts) will be
useful for classification, possibly combined with a small
amount of labeled data. Similarly, even simple image fea-
tures (e.g., edges or corners) learned from unlabeled natu-
ral images can be useful for object recognition tasks that
deal with completely unrelated images. In this context,
how can we discover such useful high-level features from
unlabeled data?