100 COMMUNICATIONS OF THE ACM | AUGUST2019 | VOL. 62 | NO. 8
almost all of W, or (c) has roughly half of W (between 1/3 and
2/3, say). To fix the latter case, we then “grab” all vertices in
with some Ω( 1)-fraction, for example 1/6th, of their neigh-
bors in S and simply move them all to S. Doing this some
constant number of times implies S has much more than
2/3rds of W (and if S was in case (a), then we show it still has
almost none of W). Then by doing another round of closure
moves, one can ensure that both S, are closed, and each of
them has either an O(ε)-fraction of W or a ( 1 − O(ε) )-fraction.
It is worth noting that our algorithm can make use of any
spectral cutting algorithm as a black box and not just Fiedler
cuts, followed by our grab and closure steps. For example,
algorithms from14, 15 run in nearly linear time and either ( 1)
report that no γ-conductance cut exists (in which case we
could terminate), ( 2) find a balanced cut of conductance
As a result we end up completing the cluster preserv-
ing clustering in polynomial time in the graph size, which
is polynomial in poly(log n), and this allows us to find the
k heavy hitters with whp in O(k poly(log n)) time. For a full
description of the algorithm, the reader is referred to refer-
ence Larsen et al. 11
We thank Noga Alon for pointing us to (Alon and Chung, 1
Lemma 2. 3), Piotr Indyk for the reference, 8 Yi Li for answering several questions about, 8 Mary Wootters for making us
aware of the formulation of the list-recovery problem in coding theory and its appearance in prior work in compressed
sensing and group testing, and Fan Chung Graham and
Olivia Simpson for useful conversations about graph partitioning algorithms.
Figure 4. Each small oval is a spectral cluster. They are well-connected internally, with sparse cuts to the outside. The large oval is the rest
of the graph, which can look like anything. Cut (a) represents a good low-conductance cut, which makes much progress (cutting the graph in
roughly half) while not losing any mass from any cluster. Cut (b) is also a low-conductance cut as long as the number of small ovals is large,
since then cutting one cluster in half has negligible effect on the cut’s conductance. However, (b) is problematic since recursing on both sides
loses half of one cluster forever.
1. Alon, N., Chung, F.R.K. Explicit construction
of linear sized tolerant networks.
Discrete Math. 72 (1988), 15–19.
2. Bar-Yossef, Z., Jayram, T. S., Kumar, R.,
Sivakumar, D. An information
statistics approach to data stream and
communication complexity. J. Comput.
Syst. Sci. 68, 4 (2004), 702–732.
3. Braverman, V., Chestnut, S.R.,
Ivkin, N., Nelson, J., Wang, Z.,
Woodruff, D.P. BPTree: An 2 heavy
hitters algorithm using constant
memory. In Proceedings of the 36th
Symposium on Principles of Database
Systems (PODS) (2017), ACM,
Chicago, IL, 361–376.
4. Braverman, V., Chestnut, S.R., Ivkin, N.,
Woodruff, D.P. Beating CountSketch
for heavy hitters in insertion streams.
In Proceedings of the 48th S TOC
(2016), ACM, Cambridge, MA.
5. Charikar, M., Chen, K., Farach-Colton, M.
Finding frequent items in data streams.
Theor. Comput. Sci. 312, 1 (2004), 3–15.
6. Cormode, G., Hadjieleftheriou, M.
Finding frequent items in data streams.
PVLDB 1, 2 (2008), 1530–1541.
7. Cormode, G., Muthukrishnan, S. An
improved data stream summary: The
count-min sketch and its applications.
J. Algorithms 55, 1 (2005), 58–75.
8. Gilbert, A.C., Li, Y., Porat, E., Strauss, M.J.
For-all sparse recovery in near-optimal
time. In Proceedings of the 41st
ICALP (2014), Springer, Copenhagen,
9. Jowhari, H., Saglam, M., Tardos, G.
Tight bounds for Lp samplers, finding
duplicates in streams, and related
problems. In Proceedings of the 30th
PODS (2011), ACM, Athens, Greece,
10. Kannan, R., Vempala, S., Vetta, A. On
clusterings: Good, bad and spectral. J.
ACM 51, 3 (2004), 497–515.
11. Larsen, K.G., Nelson, J., Nguy˜ ên, H. L.,
Thorup, M. Heavy hitters via
cluster-preserving clustering. CoRR,
12. Metwally, A., Agrawal, D., El Abbadi, A.
Efficient computation of frequent and
top-k elements in data streams, In
Proceedings of the 10th ICDT (2005),
Springer, Edinburgh, UK, 398–412.
13. Misra, J., Gries, D. Finding repeated
elements. Sci. Comput. Program 2, 2
14. Orecchia, L., Sachdeva, S., Vishnoi, N.K.
Approximating the exponential, the
Lanczos method and an Õ(m)-time
spectral algorithm for balanced
separator. In Proceedings of the 44th
STOC (2012), 1141–1160.
15. Orecchia, L., Vishnoi, N.K. Towards
an SDP-based approach to spectral
methods: A nearly-linear-time algorithm
for graph partitioning and decomposition.
In Proceedings of the 22nd SODA (2011),
SIAM, San Francisco, CA, 532–545.
16. Spielman, D.A. Linear-time encodable
and decodable error-correcting codes.
IEEE Trans. Information Theory 42, 6
Kasper Green Larsen ( firstname.lastname@example.org),
Aarhus University, Aarhus, Denmark.
Jelani Nelson (minilek@seas.
harvard.edu), Harvard University,
Cambridge, MA, USA.
Huy L. Nguyê˜n ( hu.nguyen@northeastern.
edu), Northeastern University, Boston,
Mikkel Thorup ( email@example.com),
University of Copenhagen, Denmark.
Copyright held by authors/owners.