The input to the algorithm consists of (access to) the sampling operator F, the samples u, the target sparsity level s,
and a halting criterion. We impose the following hypotheses:
• The sparsity level s is fixed, and the m × N sampling
operator F has restricted isometry constant d4s ≤ 0.1.
The algorithm is initialized with a trivial signal estimate,
meaning that the initial residual is the entire unknown target signal. Each iteration then consists of five major steps.
1. identification. Using the current samples, the algorithm computes a vector that is highly correlated with
the signal, called the signal proxy. From the proxy,
components of the signal that carrry a lot of energy are
located.
2. support Merger. The set of newly identified components is united with the set of components that appear
in the current approximation.
3. estimation. The algorithm solves a least-squares problem to approximate the target signal on the merged set
of components.
4. pruning. The algorithm produces a new approximation by retaining only the largest entries in this least-squares signal approximation.
5. sample Update. Finally, the samples are updated so
that they reflect the residual, the part of the signal that
has not been approximated.
These five steps are repeated until the halting criterion is
satisfied. In our analysis, we assume that the method uses a
fixed number of iterations, but Needell and Tropp18, 19 discuss
other alternatives. The CoSaMP algorithm can be summarized by the pseudocode presented as Algorithm 1.
Remark 1. We emphasize that the method we present is a specific example from a more general framework. The articles18, 19
discuss a number of variations. In particular, note that the term
2s in the identification step can be replaced by as for other values of a. This type of tuning is valuable in practice, but it makes
the proof more complicated.
Remark 2. Some knowledge about the sparsity level is required
as input to the algorithm. There are several strategies for estimating this parameter if it is not known a priori. One such
method would be to use the number m of measurements to
deduce the sparsity level. Since m ≈ 2s log N is often sufficient,
the estimate s ≈ m/2 log N is reasonable. A second approach
would be to run the CoSaMP algorithm using a variety of sparsity levels and select the best approximation a obtained using
the empirical error ||Fa − u|| 2. If one varies s according to a
geometric progression (e.g., s = 1, 2, 4, 8, . . . , m), this variation
increases the runtime by at most O(log m). See Appendix A of
input: Sampling matrix F, noisy sample vector u, sparsity
level s
output: An s-sparse approximation a of the target signal
a0 ← 0, v ← u, k ← 0 { Initialization }
repeat
k ← k + 1
y ← F*v
W ← supp(y2s)
T ← W supp(ak− 1)
b|T ← F† Tu
b|Tc ← 0
ak ← bs
v ← u − Fak
until halting criterion true
{ Form signal proxy }
{ Identify large components }
{ Merge supports }
{ Signal estimation }
{ Prune approximation }
{ Update current samples }
Needell and Tropp19 for more details.
3. 2. implementation
CoSaMP was designed to be a practical algorithm for signal
recovery, so it is worth spending some time on implementation issues. The least-squares step in the algorithm presents the largest contribution to the runtime. Fortunately,
we have constructed FT to ensure it is well conditioned, so
we can apply its pseudoinverse quickly using an iterative
method. Section 5 of Needell and Tropp19 contains a thorough analysis of iterative least-squares methods as modules
in CoSaMP. This analysis shows that the cost of solving the
least-squares problem is O(), where bounds the cost of a
matrix–vector multiply with FT or F T.
The remaining steps of the algorithm involve standard techniques, and we can estimate the operation counts as follows.
proxy. Forming the proxy is dominated by the cost of the
matrix–vector multiply F*v.
identification. We can locate the largest 2s entries of a vector in time O(N) using the approach in Cormen et al. 7, Ch. 9.
In practice, it may be faster to use an O(N log N) sorting
algorithm (e.g., quicksort, mergesort, heapsort7, Sec. II) on the
entries of the signal and then select the first 2s of them.
support Merger. We can merge two support sets of size O(s)
in expected time O(s) via randomized hashing methods, 7, Ch.
11 or by sorting both sets first and using the elementary
merge procedure7, p. 29 for a total cost O(s log s).
ls estimation. We can use the conjugate gradient method
or the LSQR algorit (see e.g. Paige and Saunders22) to
compute F†Tu. Initializing the least-squares algorithm
requires a matrix–vector multiply with F T, and each iteration requires one matrix–vector multiply each with FT and
F T. Since FT is a submatrix of F, the matrix–vector multiplies can also be obtained from multiplication with the full
matrix. We show in Section 5 of Needell and Tropp19 that
a constant number of least-squares iterations suffices for
our results to hold.
pruning. As in the identification step, pruning can be