By Adam Tauman Kalai, Ankur Moitra, and Gregory Valiant
1. in TRoDuc Tion
The Gaussian mixture model (GMM) is one of the oldest
and most widely used statistical models. It is comprised of a
weighted combination of heterogeneous Gaussian sources. As
a simple one-dimensional example, consider measurements
of heights of adults in a certain population, where the distribution of heights can be closely approximated as a mixture of two
univariate Gaussians, one for males and one for females. 3 Can
one recover the parameters of the Gaussians from unlabeled
height measurements alone (with no gender information)?
Our work focuses on the case where the mixture consists
of a small but unknown number of Gaussian “components”
that may overlap—the combined density may even have a
single peak, as in the height example, and the dimensionality may be high. Much of the previous work on this problem
attempts to learn the parameters through clustering, and
consequently needs to make a strong separation assumption on the components in the mixture. The primary contribution of our research is to avoid this assumption by instead
basing our learning algorithm upon the algebraic structure
of the mixture. Our algorithm succeeds even if the components overlap almost entirely, in which case accurate clustering is no longer possible.
We give a simple notion of “condition number” of a GMM
which characterizes its complexity up to polynomial factors.
Generally speaking, the conclusion is that the sample complexity and computational complexity of this general problem
are in every way polynomial, except for the dependence on
the number of Gaussians, which is necessarily exponential.
Statisticians have long known that from random samples from a GMM it is possible to identify the Gaussians in
the limit—one can eventually recover to arbitrary precision
each Gaussian component’s mean, variance, and proportion, given sufficiently many examples. 14 However, their
analysis provides no bounds on convergence rate—as the
number of samples increases, the convergence might be
logarithmically slow even for two Gaussians in one dimension.
Moreover, even if the problem has reasonable sample complexity, it is not clear, a priori, how to yield a computationally
efficient algorithm. In practice, the heuristics in widespread
use, such as the EM algorithm, suffer from local optima and
have weak theoretical guarantees.
In seminal work, Dasgupta5 put forth a polynomial-time
clustering approach for learning GMMs that is provably
accurate under certain assumptions. In particular, if the
Gaussians are sufficiently “well separated” then one can
usually group together all the samples originating from the
same Gaussian. As a by-product, parameter estimation follows easily by estimating the mean, covariance matrix, and
proportion of samples from each cluster separately. In rapid
succession, increasingly powerful clustering techniques
that require “separability assumptions” between all pairs of
Gaussians have since been analyzed. 1, 2, 4, 6, 11, 16
figure 1. The Gaussian approximations of the heights of adult women
(red) and men (blue). can one recover estimates of these Gaussian
components given only the aggregate data without gender labels
(black)? [Raw data from nhanes].
160 170 180
Height in cm
In some sense, the parameter estimation problem is more
fundamental than clustering, because accurate parameter
estimates can be easily used for accurate clustering. And
in the general case where the Gaussians may overlap, it is
impossible to cluster the points with any degree of confidence, just as one cannot accurately predict gender from the
fact that a person is, say, 1. 7 m tall. Nonetheless, from height
data alone one may still recover the means, variances, and
mixing weights of the two constituent Gaussians. Hence,
while one cannot hope to cluster the individual samples,
one can decompose the Gaussian mixture and efficiently
disentangle the Gaussian components.
1. 1. our approach
We describe a polynomial-time GMM learning algorithm—
we emphasize that throughout, we favor clarity of presentation and ease of analysis at the cost of impracticality.
The algorithm is based on the random projection method
(see, e.g., Vempala15). Since the projection of a multinor-mal distribution onto one dimension is a normal distribution, the projection of a GMM is a GMM in one dimension.
Roughly, the algorithm proceeds by projecting the data onto
a sequence of vectors, solving the associated sequence of
one-dimensional learning problems, and then reconstructing the solution to the original high-dimensional GMM
There are several obstacles that must be surmounted
to consummate this approach. First and foremost, is the
question of solving the problem in one dimension. One-
dimensional problems are often easy if one is willing to
This work appeared as “Efficiently Learning Mixtures
of Two Gaussians” (Kalai, Moitra, Valiant, STOC 2010)
and “Settling the Polynomial Learnability of Mixtures of
Gaussians” (Moitra, Valiant, FOCS 2010).