placed division with a cheaper multiplication by the reciprocal, in fixed point.
The fixed point in this case is at bit 33,
and the constant is one-third expressed
in these terms (it’s actually
0.33333333337213844). The compiler
has an algorithm for determining appropriate fixed points and constants to
achieve the division while preserving the
same rounding as an actual division operation with the same precision over the
range of the inputs. Sometimes this
requires a number of extra operations—for example, in dividing by 1022
as shown in (l, https://godbolt.org/z/
acm19_div1023).
The algorithm is well known and
documented extensively in the excellent book, Hacker’s Delight. 8
In short, you can rely on the compiler
to do a great job of optimizing division
by a compile-time-known constant.
You might be thinking: Why is this
such an important optimization? How
often does one actually perform integer division, anyway? The issue is not
so much with division itself as with the
related modulus operation, which is
often used in hash-map implementations as the operation to bring a hash
value into the range of the number of
hash buckets.
Knowing what the compiler can do
here can lead to interesting hash-map
implementations. One approach is to
use a fixed number of buckets to allow
the compiler to generate the perfect
modulus operation without using the
expensive divide instruction.
Most hash maps support rehashing
to a different number of buckets. Na-
ively, this would lead to a modulus with
a number known only at runtime, forc-
ing the compiler to emit a slow divide
instruction. Indeed, this is what the
GCC libstdc++ library implementation
of std::unordered _ map does.
Clang’s libc++ goes a little further:
it checks if the number of buckets is a
power of two, and if so skips the divide
instruction in favor of a logical AND.
Having a power-of-two bucket count is
alluring as it makes the modulus op-
eration fast, but in order to avoid exces-
sive collisions it relies on having a good
hash function. A prime-number bucket
count gives decent collision resistance
even for simplistic hash functions.
Some libraries such as
boost::multi _ index go a step fur-
ther: instead of storing the actual num-
ber of buckets, they use a fixed number
of prime-sized bucket counts (m).
That way, for all possible hash-table
sizes the compiler generates the perfect modulus code, and the only extra
(k)
(j)
You can rely on
the compiler to
do a great job of
optimizing division
by a compile-time-
known constant.