figure 2. Recovery of full matrices from their entries. for each (n, m, r) triple, we repeated the following procedure 50 times. A matrix M
of rank r and a subset of m entries were selected at random. Then, we solved the nuclear norm minimization for X subject to Xij = Mij on the
selected entries. We declared M to be recovered if Xopt − M F/ M F < 10− 3. The results are shown for (a) general 50 × 50 matrices (b) 50 × 50
positive definite matrices. The color of each cell reflects the empirical recovery rate (scaled between 0 and 1). White denotes perfect recovery
in all experiments, and black denotes failure for all experiments.
1
0.9
0.8
0.7
dr/m
0.6
0.5
0.4
0.3
0.2
0.1
dr/m
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
0.1 0.2 0.3
m/n2
0.4 0.5 0.6 0.7 0.8 0.9 1
(a)
0.1 0.2 0.3
m/Dn
0.4 0.5 0.6 0.7 0.8 0.9 1
(b)
dimension of the rank r matrices, dr = r (2n − r), and the number of measurements m.
Note that the axes range from 0 to 1 as a value > 1 on
the x-axis corresponds to an overdetermined linear system where the semidefinite program always succeeds, and
a value > 1 on the y-axis corresponds to when there are an
infinite number of rank r matrices with the provided entries.
The color of each cell in the figures reflects the empirical recovery rate of the 50 runs (scaled between 0 and 1).
Interestingly, the experiments reveal very similar plots for
different n, suggesting that our theoretical upper bounds on
recovery may be rather conservative.
For a second experiment, we generated random
positive semidefinite matrices and tried to recover them from
their entries using the nuclear norm heuristic. As above,
we repeated the same procedure 50 times for each (n,
m, r) triple. We generated M, an n × n positive semidefinite matrix of rank r, by sampling an n × r factor MF
with i.i.d. Gaussian entries and setting M = MFMFT. We
sampled a subset W of m entries uniformly at random.
Then, we solved the nuclear norm minimization problem with an additional constraint that the decision variable be positive definite. Figure 2(b) shows the results
of these experiments for n = 50. The x-axis again corresponds to the fraction of the entries of the matrix that
are revealed to the solver, but, in this case, the number of
measurements is divided by Dn = n(n + 1)/2, the number
of unique entries in a positive-semidefinite matrix, and
the dimension of the rank r matrices is dr = nr − r (r − 1)/2.
The color of each cell is chosen in the same fashion as
in the experiment with full matrices. Interestingly, the
recovery region is much larger for positive semidefinite
matrices, and future work is needed to investigate if the
theoretical scaling is also more favorable in this scenario
of low-rank matrix completion.
These phase transition diagrams reveal a considerably
smaller region of parameter space than the Gaussian models
studied in Recht et al.
30 In the experiments in Recht et al.,
30
M was generated in the same fashion as above, but, in the
place of sampling entries, we generated m random Gaussian
projections of the data (see the discussion in Section 2. 4).
In these experiments, the recovery regime is far larger than
that in the case of sampling entries, but this is not particu-
larly surprising as each Gaussian observation measures a
contribution from every entry in the matrix M.
2. 3. More general bases
Our main result (Theorem 2. 3) extends to a variety of other
low-rank matrix completion problems beyond the sampling
of entries. Indeed, suppose we have two orthonormal bases
f1, …, fn and g1, …, gn of Rn, and that we are interested in solv-
ing the rank minimization problem
( 2. 7)
The machine learning community’s interest in specialized
algorithms for multiclass and multitask learning provides
a motivating example (see, e.g., Amit et al.
1 and Argyriou
et al.
2). In multiclass learning, the goal is to build multiple classifiers with the same training data to distinguish
between more than two categories. For example, in face
recognition, one might want to classify whether an image
patch corresponds to an eye, nose, or mouth. In multitask
learning, we have a large set of data and a variety of different classification tasks, but, for each task, only partial
subsets of the data are relevant. For instance, in activity
recognition, we may have acquired sets of observations of
multiple subjects and want to determine if each observed
person is walking or running. However, a different classifier is desired for each individual, and it is not clear how
having access to the full collection of observations can
improve classification performance. Multitask learning
aims to take advantage of access to the full database to
improve performance on individual tasks. A description
of how to apply our results to the multiclass setting can be