each step—but that was the original representation we
selected and then discarded because it was not suitable
for zips!
The final compromise is to allow either—but not both—
of these two representations. We cannot allow both—hence
there is only one new bundle member rather than two—
because while we can easily convert a MultisC a into a MultisP a,
the other direction is not efficiently implementable. The
final definition of the Multis type alias is therefore:
type Multis a = Either (MultisC a) (MultisP a)
Each stream function that can operate on Multi values
consumes the Multis a in the sMultis field of the stream
bundle. It must be prepared to accept either a MultisC or a
MultisP, which is a “mixed” stream of scalars and Multi’s.
However, we always try to produce a MultisC and only fall
back to a MultisP as a last resort. Even operations that can
work with either representation are often worth specializing for the MultisC form. In the case of msums above, this
allows us to gobble up as many Multi values as possible and
only then switch to consuming scalars, thereby cutting the
number of accumulating parameters in half and reducing
register pressure.
One could imagine attempting a representation that
somehow guarantees longer runs of Multis, but this would
add complexity and we doubt it would have any advantage
over the MultisC representation, which has a distinct “phase
shift” between operations on Multi and operations on scalars. For operations like zip that operate on multiple streams,
we would need to guarantee that both streams have the same
structure—it simply does not do to have one stream in the
pair yield a scalar while the other yields a Multi. The MultiC/
MultiP distinction neatly captures this requirement by framing it in terms of who has control over what is yielded next,
consumers or producers.
3. 2. A SIMD version of dotp
With a stream representation for SIMD computation in hand,
we can write a SIMD-ized version of the dot product from
Section 2:
dotp_simd :: Vector Double → Vector Double → Double
dotp_simd v w = msum (mzip With (*) v w)
The only difference with respect to the scalar implementation in Section 2. 1 is that we use variants of foldl′ and
zip With specialized to take function arguments that operate
on values that are members of the Num type class. While we
could have used versions of these functions that take two
function arguments (our library supports both options), one
for scalars and one for Multis, the forms that use overloading
to allow the function argument to be used at both the type
a → a → a and Multi a → Multi a → Multi a are a convenient
shorthand:
mfold′ :: (PackedVector Vector a, Num a, Num (Multi a) )
⇒(∀ b.Num b ⇒ b → b → b)
→ a → Vector a → a
mzip With :: (PackedVector Vector a, Num a, Num
(Multi a) )
⇒ (∀ b.Num b ⇒ b → b → b)
→ Vector a → Vector a → Vector a
msum :: (PackedVector Vector a, Num a, Num (Multi a))
⇒ Vector a → a
msum = mfold′ (+) 0
The particular fold we use here, mfold′, maintains two accumulators (a scalar and a Multi) when given a MultisP a and
one accumulator when given a MultisC a. The initial value
of the scalar accumulator is the third argument to mfold′
and the initial value of the Multi accumulator is formed by
replicating this scalar argument across a Multi. The result
of the fold is computed by combining the elements of the
Multi accumulator and the scalar accumulator using the
function multifold from Figure 1. Note that the first argument to mfold′ must be associative and commutative. The
PackedVector type class constraint ensures both that the
type a is an instance of Multi Type and that elements contained in the vector are contiguous so that they can be
extracted a Multi a at a time.
The stream version of mfold′, mfold′s, can generate efficient code no matter what representation is contained in
a Multis a. On the other hand, the stream version of mzipWith, mzipWiths, requires that both its vector arguments
have a MultisC representation. Since there is no good way
to zip two streams when one yields a scalar and the other a
Multi, if either bundle argument to mzip Withs does not have a
MultisC representation available, mzip Withs falls back to an
implementation that uses only scalar operations.
3. 3. Automatic parallelization
Using SIMD instructions does not come entirely for free.
Consider mapping over a vector represented using multis:
mmap :: (PackedVector Vector a)
⇒ (a → a)
→ (Multi a → Multi a)
→ Vector a → Vector a
To map efficiently over the vector, it does not suffice to
pass a function of type (a → a), because that does not work
over multis. We must also pass a semantically equivalent
multi-version of the function. For simple arithmetic, matters
are not too bad:
foo :: Vector Float → Vector Float
foov= mmap (λx y → x + y 2) (λx y → x + y 2) v
The two lambdas are at different types, but Haskell’s overloading takes care of that. We could attempt to abstract this
pattern like this:
mmap :: (PackedVector Vector a)
⇒ (∀ a.Num a ⇒ a → a)
→ Vector a → Vector a
But that attempt fails if you want operations in class Floating,
say, rather than Num. What we want is a way to automatically
multi-ize scalar functions (such as (λx y → x + y 2) above), so
that we get a pair of a scalar function and a multi function,
which in turn can be passed to map.