Whereas the rank function is equal to the number of nonvanishing singular values, the nuclear norm equals their
sum. The nuclear norm is to the rank functional what the
1 norm is to the 0 norm in the area of sparse signal
recovery. The main point here is that the nuclear norm is a
convex function and can be optimized efficiently via semidefinite programming.
There are many norms one could define for a given
matrix. The operator norm is the largest singular value.
The Frobenius norm is equal to the square root of the sum
of the squares of the entries. This norm is akin to the standard Euclidean norm on a real vector space. Why should the
nuclear norm provide lower rank solutions than either of
these two more commonly studied norms?
One can gain further intuition by analyzing the geometric
structure of the nuclear norm ball. The unit nuclear norm
ball is precisely the convex hull of the rank 1 matrices of unit
Frobenius norm. The nuclear norm minimization problem
( 2. 3) can be interpreted as inflating the unit ball until it
just touches the affine space Xij = Mij. Such an intersection
will occur at an extreme point of the nuclear norm ball, and
these extreme points are sparse convex combinations of
rank 1 matrices. That is, the extreme points of the nuclear
norm ball have low rank. This phenomenon is depicted
graphically in Figure 1. There, we plot the unit ball of the
nuclear norm for matrices parametrized as
data-mining and collaborative filtering. In these fields,
researchers commonly aim to find an explicit factorization
X = lr T that agrees with the measured entries. Here l and r
are n × k matrices. Since there are many possible such factor-
izations that might agree with the observations, a common
approach searches for matrices l and r that have Frobenius
norm as small as possible, that is, the solution of the optimi-
( 2. 4)
where we are minimizing with respect to l ∈ Rn×k, r ∈ Rn×k,
and X ∈ Rn×n, and ⋅ F denotes the Frobenius norm.
Surprisingly, the optimization problem ( 2. 4) is equivalent
to minimization of the nuclear norm subject to the same
equality constraints provided k is chosen to be larger than
the rank of the optimum of the nuclear norm problem ( 2. 3).
To get an intuition for this equivalence, take any matrix X
of rank k. Suppose the SVD is X = USv T. If we set and
The extreme points of this cylindrical object are the rank 1
matrices with unit Frobenius norm. The red line in this figure
is a “random,” one-dimensional, affine subspace which, as
expected, intersects the nuclear norm ball at a rank 1 matrix.
figure 1. unit ball of the nuclear norm for symmetric 2 × 2 matrices.
The red line depicts a random one-dimensional affine space. such a
subspace will generically intersect a sufficiently large nuclear norm
ball at a rank one matrix.
because for all j. Thus, the optimal solution
of ( 2. 3) is suboptimal for ( 2. 4). The full equivalence can be
seen via an appeal to semidefinite programming and can be
found in Recht et al.
The main advantage of this reformulation ( 2. 4) is to substantially decrease the number of decision variables from n2
to 2nr. For large problems, this leads to a significant reduction in computation time, such that very large instances can
be solved on a desktop computer. On the other hand, the formulation ( 2. 4) is nonconvex and thus potentially has local
minima that are not globally optimal. Nonetheless, this factored approximation ( 2. 4) of the nuclear norm is one of the
most successful stand-alone approaches to solving the Netflix Prize problem.
16, 24 Indeed, it was one of the foundational
components of the winning team’s prediction engine.
2. 1. Main results
As seen in our first example ( 2. 1), it is impossible to recover
a matrix which is equal to 0 in nearly all of its entries unless
we see all the entries of the matrix. This is particularly likely
if the singular vectors of a matrix M have most of their mass
concentrated in a few coordinates. For instance, consider
the rank 2 symmetric matrix M given by
where the singular values are arbitrary. Then, this matrix
vanishes everywhere except in the top-left 2 × 2 corner, and
one would basically need to see all the entries of M to be
able to recover this matrix exactly. There is an endless list
of examples of this sort. Hence, we arrive at the notion that
the singular vectors need to be sufficiently spread across