O(nlog(n)) and the rank is less than log(n). This provides
the asymptotically tightest bound on the number of entries
required for recovery, but the authors need to assume that
the singular values of the unknown low-rank matrix are all
of order unity and that the rank is less than log(n) for their
results to hold.
3. 2. Toward a more general theory
More general measurement models. In our original work,
we anticipated in Section 1. 3 that our results would
extend to the case where one observes a small num-
ber of arbitrary linear functionals of a hidden matrix M.
Set N = n2 and let a1, .. . , aN be an orthonormal basis for
the linear space of n × n matrices with the usual inner
product áX, yñ = trace(XTy). Then, we predicted that
our results should also apply to the rank minimization
problem
minimize rank(X) subject to áak, Xñ = áak, Mñ k ∈ Ω, ( 3. 1)
where W ⊂ { 1,..., N} is selected uniformly at random.
In fact, ( 3. 1) is ( 2. 2) when the orthobasis is the canonical basis (eieTj )
1≤i, j≤n. We conjectured that those low-rank matrices that have small inner product with all the
basis elements ak may be recoverable by nuclear norm
minimization.
This conjecture was proven to be true by Gross,
17 where
a general definition of coherence was provided, and it
was shown that the same number of measurements sufficed for reconstruction under this modified definition.
Additionally, in Gross et al.,
18 it was shown that the Pauli
basis, studied in quantum information theory, was incoherent with any basis. This fact follows because all of the
matrices in the Pauli basis are mutually orthogonal and
unitary. Hence, the Pauli basis is a deterministic collection
of matrices such that a random subset of these matrices
can be used to reconstruct any low-rank matrix. This result
has been applied to propose new methods in quantum-state tomography where one aims to determine the state
of some quantum mechanical system with as few measurements or experiments as possible.
Matrix completion with noise. All of the results described
above concern the problem of exact matrix completion,
where we have perfect information about the entries
of the matrix to be reconstructed. Of course, in almost
all real-world problems, we can only gain access to noisy
samples of the entries of the matrices we would like to
recover. Fortunately, many authors have investigated
the stability of matrix completion when the observations
are noisy. The first such result6 uses a stability argument
based on convexity to guarantee accurate recovery from
noisy data. Several subsequent works studied this problem under different matrix models. The work in Keshavan
et al.
22 gives near-optimal bounds provided the unknown
matrix obeys additional assumptions which say that the
singular values are all about the same size. In Negahban
and Wainwright,
28 error bounds are derived provided the
matrix is not spiky; that is to say, assuming that all the
entries have about the same magnitude. We additionally
invite interested readers to peruse Koltchinskii et al.,
23
which very recently introduced powerful results with yet a
slightly different matrix model.
Algorithmic innovations. While it was known that the
nuclear norm problem could be efficiently solved by
semidefinite programming, the results of Recht et al.
30
and the full version of this paper have inspired the development of many special purpose algorithms to rapidly
minimize the nuclear norm. For example, in Cai et al.
4
and Ma et al.,
27 the authors show that many of the fast first
order methods developed for compressed sensing problems can be adapted to solve large-scale matrix completion problems. These algorithms are projected gradient
algorithms which operate by alternately correcting the
predictions on the observed entries and soft-threshold-ing the singular values of the iterate. Using accelerated
gradient schemes, the authors Ji and Ye20 and Toh and
Yun34 have improved these algorithms to provide very fast
implementations of nuclear norm minimization. These
codes enable the solution of problems with hundreds of
thousands of rows and columns in a few hours on a standard workstation. Moreover, the numerical experiments
in these works confirm that nuclear minimization can
successfully recover very large low-rank matrices with on
the order of 3 to 5 times the number of latent degrees of
freedom. Additional experiments demonstrate that low-rank matrices can be robustly recovered under significant
additive noise using nuclear-norm minimization.
3. 3. from sparsity to rank and beyond
In concert with Recht et al.,
30 our work on matrix completion crystallized some of the foundational ideas of compressed sensing. We were able to extend the notion of
sparsity to the much more general concept of matrix rank,
and situate the main ideas of compressed sensing in a dramatically broader context.
One exciting new development since the appearance of
our original paper shows that the notions of sparsity and
rank are in some sense orthogonal. If a matrix can be written as a sum of a low-rank matrix and a sparse matrix, then
these two matrices can be identified and deconvolved from
their sum. Deterministic conditions required for such
an algorithm to work were provided in Chandrasekaran
et al.
12 A randomized analysis in Candes et al.
5 provided
sharper recovery guarantees and furnished a new method
for Robust Principal Components Analysis, demonstrating that principal components could be constructed even
in the presence of a large number of outliers. Moreover,
the results of Chandrasekaran et al.
12 were extended
to provide convex algorithms for identifying Gaussian
graphical models in Chandrasekaran et al.
10 Prior art in
this area had resorted to nonconvex heuristics based on
Expectation–Maximization with no provable guarantees. It
is quite surprising that, under very modest assumptions,
a convex algorithm can solve a hidden variable estimation
problem in multivariate statistics.