red with probability f. If we repeat the experiment k times
with k independent hash functions, we get a multiset S of
k samples with replacement from X and the fraction red
balls in S concentrates around f as we increase the number
Consider the alternative experiment using a single hash
function, where we use some bits of the hash value to partition X into k bins, and then use the remaining bits as a local
hash value. We pick the ball with the smallest hash value in
each bin. This is a sample S from X without replacement,
and again, the fraction of red balls is concentrated around f.
The big difference between the two schemes is that the
second one runs Ω(k) times faster. In the first experiment,
each ball participated in k independent experiments, but in
the second one with k-partitions, each ball picks its bin, and
then only participates in the local experiment for that bin.
Thus with the k-partition, essentially, we get k experiments
for the price of one.
This generic idea has been used for different types of statistics. Flajolet and Martin10 introduced it to count the number of distinct items in a multiset, and recently, Li et al. 14
used it for Minwise estimation of the Jaccard Similarity of
The issue is that no realistic hashing scheme was known
to make a good enough k-partition for the above kind of statistics to make sense. The point is that the contents of different bins may be too correlated, and then we get no better
concentration with a larger k. In the independence paradigm of Wegman and Carter, 26 it would seem that we need
independence at least k to get sufficiently independent statistics from the different bins.
An efficient solution is based on a variant of double tabulation described below.
4. 3. Mixed tabulation
For Theorem 8, we may use d = 4 even if c is larger, but then
h0 will introduce many collisions. To avoid this problem we
mix the schemes in mixed tabulation. Mathematically, we
use two simple tabulation hash functions h1 : [u] → Σd and
h2 : Σc+d → [2r], and define the hash function h(x) h2(x
h1(x) ), where denotes concatenation of characters. We call
x h1(x) the derived key, consisting of c original characters
and d derived characters. Since the derived keys includes the
original keys, there are no duplicate keys.
We note that mixed tabulation only requires c+d lookups if we instead store simple tabulation functions h1, 2 : Σc
→ Σd × [r] and , computing h(x) by (v1, v2) = h1, 2(x);
h(x) = v1 ⊕ h2(v2). This efficient implementation is similar
to that of twisted tabulation, and is equivalent to the previous definition. As long as we have at least one derived
character, mixed tabulation has all the distribution properties of twisted tabulation, particularly, the Chernoff-style
concentration bound from Theorem 5. At the same time,
we get the full randomness from Theorem 8 for any given
set S of size ( 1 − Ω( 1))Σ. Based on these properties and
more, it is proved in Ref. 7 that mixed tabulation, w.h.p.,
gets essentially the same concentration bounds as full randomness for all of the abovementioned statistics based on
Research partly supported by Advanced Grant DFF-0602-
02499B from the Danish Council for Independent Research
under the Sapere Aude research carrier program.
1. Alon, N., Matias, Y., Szegedy, M. The
space complexity of approximating
the frequency moments. J. Comput.
Syst. Sci. 58, 1 (1999), 209–223, 1999.
Announced at S TOC’ 96.
2. Azar, Y., Broder, A.Z., Karlin, A.R.,
Upfal, E. Balanced allocations.
SIAM J. Comput. 29, 1 (1999),
180–200. Announced at S TOC’ 94.
3. Braverman, V., Chung, K.-M., Liu, Z.,
Mitzenmacher, M., Ostrovsky, R. AMS
without 4-wise independence on
product domains. In Proceedings of
the 27th S TACS (2010), 119–130.
4. Broder, A.Z., Charikar, M., Frieze, A.M.,
Mitzenmacher, M. Min-wise independent
permutations. J. Comput. Syst. Sci.
60, 3 (2000), 630–659. Announced
at STOC’ 98.
5. Carter, L., Wegman, M.N. Universal
classes of hash functions. J. Comput.
Syst. Sci. 18, 2 (1979), 143–154.
Announced at S TOC’ 77.
6. Christiani, T., Pagh, R., Thorup, M.
From independence to expansion and
back again. In Proceedings of the
47th STOC (2015), 813–820.
7. Dahlgaard, S., Knudsen, M.B. T.,
Rotenberg, E., Thorup, M. Hashing
for statistics over k-partitions. In
Proceedings of the 56th FOCS (2015),
8. Dahlgaard, S., Knudsen, M.B. T.,
Rotenberg, E., Thorup, M. The power
of two choices with simple tabulation.
In Proceedings of the 27th SODA
9. Dahlgaard, S., Thorup, M.
independence with twisted tabulation.
In Proceedings of the 14th S WAT
10. Flajolet, P., Martin, G.N. Probabilistic
counting algorithms for data base
applications. J. Comput. Syst. Sci. 31,
2 (1985), 182–209.
11. Goodrich, M. T., Mitzenmacher, M.
Invertible bloom lookup tables. In
Proceedings of the 49th Allerton
Conference on Communication,
Control, and Computing (2011),
12. Indyk, P. A small approximately
min-wise independent family of
hash functions. J. Algorithms 38,
1 (2001), 84–90. Announced at
13. Knuth, D. E. Notes on open addressing.
14. Li, P., Owen, A.B., Zhang, C.-H. One
permutation hashing. In Proceedings
of the 26th NIPS (2012), 3122–3130.
15. Mitzenmacher, M., Vadhan, S.P.
Why simple hash functions work:
Exploiting the entropy in a data
stream. In Proceedings of the 19th
SODA (2008), 746–755.
16. Motwani, R., Raghavan, P. Randomized
Algorithms. Cambridge University
17. Pagh, A., Pagh, R. Uniform hashing
in constant time and optimal space.
SIAM J. Comput. 38, 1 (2008), 85–96.
18. Pagh, A., Pagh, R., Ružić, M. Linear
probing with constant independence.
SIAM J. Comput. 39, 3 (2009),
1107–1120. Announced at STOC’07.
19. Pagh, R., Rodler, F. F. Cuckoo hashing.
J. Algorithms 51, 2 (2004), 122–144.
Announced at ESA’01.
20. Pa ˇtras˛cu, M., Thorup, M. On the
k-independence required by linear
probing and minwise independence.
In Proceedings of the 37th ICALP
21. Pa ˇtras˛cu, M., Thorup, M. The power
of simple tabulation-based hashing.
J. ACM 59, 3 (2012), Article 14.
Announced at STOC’ 11.
22. Pa ˇtras˛cu, M., Thorup, M. Twisted
tabulation hashing. In Proceedings of
the SODA (2013), 209–228.
23. Schmidt, J.P., Siegel, A., Srinivasan, A.
for applications with limited
independence. SIAM J. Discrete Math.
8, 2 (1995), 223–250. Announced at
24. Siegel, A. On universal classes of
extremely random constant-time
hash functions. SIAM J. Comput. 33,
3 (2004), 505–543. Announced at
25. Thorup, M. Simple tabulation, fast
expanders, double tabulation, and
high independence. In Proceedings of
the 54th FOCS (2013), 90–99.
26. Wegman, M.N., Carter, L. New classes
and applications of hash functions.
J. Comput. Syst. Sci. 22, 3 (1981),
265–279. Announced at FOCS’ 79.
27. Zobrist, A. L. A new hashing method
with application for game playing.
Technical Report 88, Computer
Sciences Department, University of
© 2017 ACM 0001-0782/17/07 $15.00
Mikkel Thorup (mikkel2thorup@gmail.
com), Department of Computer Science,
University of Copenhagen, Denmark.