| Done
A stream is a triple of values: an internal (existentially
quantified) state, represented by the type variable s in the
above definition, a size, and a step function that, when
given a state, produces a Step. A Step may be Done, indicating that there are no more values in the Stream, it may
Yield a value and a new state, or it may produce a new state
but Skip producing a value. The presence of Skip allows
us to easily express functions like filter within the stream
fusion framework.
To see concretely how this helps us avoid recursive functions, let us write map for vectors using streams
map :: (a → b) → Vector a → Vector b
map f = unstream maps f stream
The functions stream and unstream convert a Vector to and
from a stream. A Vector is converted to a stream whose state
is an integer index and whose step function yields the value
at the current index, which is incremented at each step.
To convert a stream back into a Vector, unstream allocates
memory for a new vector and writes each element to the
vector as it is yielded by the stream—unstream embodies a
recursive loop. Though imperative, the allocation and writing of the vector are safely embedded in pure Haskell using
the ST monad. 9
The real work is done by maps, which is happily
non-recursive:
maps :: (a → b) → Stream a → Stream b
maps f (Stream step s) = Stream step′s
where
step′ s = case step s of
Yield x s′ → Yield (f x) s′
Skip s′ → Skip s′
Done → Done
With this definition, the equational rule mentioned in the
Introduction, map f map g ≡ map (f g), falls out automatically. To see this, let us first inline our new definition of map
in the expression map f map g:
mapf map g ≡
unstream maps f stream unstream maps
g stream
Given this form, we can immediately spot where an intermediate structure is formed—by the composition stream
unstream. This composition is, in effect, the identity function, so we should be able to eliminate it entirely. GHC’s
rewrite rules enable programmers to express algebraic identities such as stream unstream = id in a form that GHC can
understand and automatically apply. Stream fusion relies
critically on this ability, and the vector library includes
exactly this rule. With the rule in place, GHC transforms our
original composition of maps into
mapf map g ≡
unstream maps f maps g stream
Conceptually, stream fusion pushes all recursive loops
into the final consumer. The two composed invocations
of map become a composition of two non-recursive calls
to maps. The inliner is now perfectly capable of combin-
ing maps f maps g into a single Stream function. Stream
fusion gives us the equational rule map f map g ≡ map
(f g) for free.
2. 1. Fusing the vector dot product
The motivating example we will use for the rest of the
paper is the vector dot product. A high-level implementation of this function in Haskell might be written as
follows:
dotp :: Vector Double → Vector Double → Double
dotp v w = sum (zip With (*) v w)
It seems that this implementation will suffer from severe
inefficiency—the call to zipWith produces an unnecessary intermediate vector that is immediately consumed
by the function sum. In expressing dotp as a composition
of collective operations, we have perhaps gained a bit of
algorithmic clarity, but in turn we have incurred a performance hit.
We have already seen how stream fusion eliminates intermediate structures in the case of a composition of two calls
to map. Previous fusion frameworks could handle that example but were stymied by the presence of a zip With. However
stream fusion has no problem fusing zip With, which we can
see by applying the stream transformations we saw earlier
to dotp.
The first step is to re-express each Vector operation as the
composition of a Stream operation and appropriate conversions between Vectors and Streams at the boundaries. The
functions zipWith and sum are expressed in this form as
follows:
zip With :: (a → b → c) → Vector a → Vector b →
Vector c
zip With f v w = unstream (zip Withs f (stream v)
(stream w) )
sum :: Num a ⇒ Vector a → a
sum v = foldl′s 0 (+) (stream v)
It is now relatively straightforward to transform dotp to eliminate the intermediate structure:
dotp :: Vector Double → Vector Double → Double
dotp v w ≡ sum (zip With (*) v w)
≡ foldl′s 0 (+) (stream (unstream
(zip Withs (*) (stream v) (stream w))))
≡ foldl′s 0 (+)
(zip Withs (*) (stream v) (stream w))
This transformation again consists of inlining a few definitions, something that GHC can easily perform, and rewriting
the composition stream unstream to the identity function.
After this transformation, the production (by zip With) and
following consumption (by sum) of an intermediate Vector
becomes the composition of non-recursive functions on
streams.
We can see how iteration is once again pushed into the
final consumer by looking at the implementations of foldl’s
and zipWiths. The final consumer in dotp is foldl′s, which
is implemented by an explicit loop that consumes stream