fewer degrees of freedom than the size of M would suggest.
The question is then whether M can be recovered from a
suitably chosen sampling of its entries without collecting
n2 measurements.
In this paper, we demonstrate that most low-rank
matrices can be indeed recovered from a very sparse
sampling of their entries. In Section 2, we summarize
the main results of our paper, highlighting the necessary
assumptions, algorithmic ingredients, and theoretical
foundations of reconstructing matrices from a presented
collection of entries. In Section 3, we survey the subsequent developments in this area, including refinements
and important extensions of our theory. We close with a
discussion of further progress and advances in low-rank
and sparse modeling.
2. MATRiX CoMPLe Tion
Which matrices?
In general, one cannot hope to be able to recover a low-rank
matrix from a sample of its entries. Consider the rank 1
matrix M equal to
Which sampling sets?
Clearly, one cannot hope to reconstruct any low-rank matrix
M—even of rank 1—if the sampling set avoids any column
or row of M. Suppose that M is of rank 1 and of the form xy T,
x, y ∈ Rn so that the (i, j ) entry is given by Mij = xi yj . Then, if we
do not have samples from the first row, one could never infer
the value of the first component x1 as no information about
x1 is observed. There is, of course, nothing special about the
first row and this argument extends to any row or column.
To have any hope of recovering an unknown matrix, one
needs to have access to at least one observation per row and
one observation per column.
This example demonstrates that there are sampling sets
where one would not even be able to recover matrices of
rank 1. But what happens for typical sampling sets? Can
one recover a low-rank matrix from almost all sampling sets
of cardinality m Formally, suppose that the set W of locations corresponding to the observed entries ( (i, j ) ∈ W if Mij
is observed) is a set of cardinality m sampled uniformly at
random. Then, can one recover a generic low-rank matrix
M, perhaps with very large probability, from the knowledge
of the value of its entries in the set W?
( 2. 1)
where here and throughout, ei is the ith canonical basis vector in Euclidean space (the vector with all entries equal to
0 but the ith equal to 1). The matrix M has the entries of x
along its first row and all the other entries are 0. Clearly, this
matrix cannot be recovered from a sampling of its entries
unless we see all of the entries in the first row. As another
example, the matrix e1eTn is a matrix with a 1 in the ( 1, n)
entry and 0s everywhere else. If we do not see this upper
right corner, then we cannot distinguish the matrix from
the all 0s matrix.
Even if it is impossible to recover all low-rank matrices
from a set of sampled entries, can one recover most of them?
To investigate this possibility, we introduce a simple model
of low-rank matrices.
Which algorithm?
If the number of measurements is sufficiently large, and
if the entries are close to uniformly distributed, one might
hope that there is only one low-rank matrix with these
entries. If this were true, one would want to recover the data
matrix by solving the optimization problem
( 2. 2)
Definition 2. 1. Let M be a rank r matrix with SVD defined
by ( 1. 1). Then we say that M belongs to the random orthogonal model if the family {uk}
1≤ k ≤ r is selected uniformly at
random among all families of r orthonormal vectors, and
similarly for {vk}
1≤ k ≤ r. The two families may or may not be
independent of each other. We make no assumptions about the
singular values, sk.
where X is the decision variable and rank(X) is equal to the
rank of the matrix X. The program ( 2. 2) is a common sense
approach which simply seeks the simplest explanation fitting the observed data. If there were only one low-rank object
fitting the data, the solution of ( 2. 2) would recover M perfectly. This is unfortunately of little practical use, because
not only is this optimization problem NP-hard but also all
known algorithms which provide exact solutions require
time doubly exponential in the dimension n of the matrix in
both theory and practice.
If a matrix has rank r, then it has exactly r nonzero singular values so that the rank function in ( 2. 2) is simply the
number of nonvanishing singular values. In this paper, we
consider an alternative which minimizes the sum of the singular values over the constraint set. This sum is called the
nuclear norm,
If a matrix is sampled from the random orthogonal
model, then we would expect most of the entries to be nonzero. This model is convenient in the sense that it is both
very concrete and simple, and useful in the sense that it will
help us fix the main ideas. In the sequel, however, we will
consider far more general models. The question for now is
whether or not one can recover such a generic matrix from
a sampling of its entries.
where, here and below, sk(X) denotes the kth largest singular
value of X. The heuristic optimization we study is then given by
( 2. 3)