results show that there are still some differences between
them due to different code generators.e We measured a VM
startup time of 18 ms for V8 and 30ms for SpiderMonkey.
These times are included along with compilation times as
bars stacked on top of the execution time of each benchmark. Overall, the results show that WebAssembly is competitive with native code, with seven benchmarks within
10% of native and nearly all of them within 2× of native.
We also measured the execution time of the PolyBenchC
benchmarks running on asm.js. On average, WebAssembly is
33.7% faster than asm.js. Especially validation is significantly
more efficient. For SpiderMonkey, WebAssembly validation
takes less than 3% of the time of asm.js validation. In V8,
memory consumption of WebAssembly validation is less
than 1% of that for asm.js validation.
Code size. Figure 5 compares code sizes between
WebAssembly, minified asm.js, and × 86-64 native code. For
the asm.js comparison we use the Unity benchmarks, for the
native code comparison the PolyBenchC and SciMark
benchmarks. For each function in these benchmarks, a
yellow point is plotted at 〈sizeasmjs, size wasm〉 and a blue point at
〈sizex86, sizewasm〉. Any point below the diagonal represents code
for which WebAssembly is smaller than the corresponding
other representation. On average, WebAssembly code is
62.5% the size of asm.js, and 85.3% of native × 86-64 code.
6. RELATED WORK
Microsoft’s ActiveX was a technology for code-signing × 86
and more efficient and avoiding the limitation of JITs that
usually do not support irreducible control flow. Reusing the
advanced JITs from four different JavaScript engines has
been a resounding success that allowed all engines to
achieve high performance in a short time.
Bounds checks. By design, all memory accesses in
WebAssembly can be guaranteed safe with a single dynamic
bounds check, which amounts to checking the address
against the current size of the memory. An engine will allocate the memory in a large contiguous range beginning at
some (possibly non-deterministic) base in the engine’s
process, so that all access amounts to a hardware address
base+addr. While base can be stored in a dedicated machine
register for quick access, a more aggressive strategy is to
specialize the machine code for each instance to a specific
base, embedding it as a constant directly into the code,
freeing a register. Although the base may change when the
memory is grown dynamically, it changes so infrequently that it
is affordable to patch the machine code when it does.
On 64 bit platforms, an engine can make use of virtual
memory to eliminate bounds checks for memory accesses
altogether. The engine simply reserves 8GB of virtual address
space and marks as inaccessible all pages except the valid portion of memory near the start. Since WebAssembly memory
addresses and offsets are 32 bit integers plus a static constant,
by construction no access can be further than 8GB away from
base. Consequently, the JIT can simply emit plain load/store
instructions and rely on hardware protection mechanisms to
catch out-of-bounds accesses.
Parallel and streaming compilation. With ahead-of-time
compilation it is a clear performance win to parallelize
compilation of WebAssembly modules, dispatching individual functions to different threads. For example, both V8
and SpiderMonkey achieve a 5-6× improvement in compilation speed with eight compilation threads. In addition, the
design of the WebAssembly binary format supports
streaming where an engine can start compilation of individual
functions before the full binary has been loaded. When
combined with parallelization, this minimizes cold startup.
Code caching Besides cold startup, warm startup time is
important as users will likely visit the same Web pages
repeatedly. The JavaScript API for the IndexedDB database
allows JavaScript to manipulate and compile WebAssembly
modules and store their compiled representation as an
opaque blob. This allows a JavaScript application to first
query IndexedDB for a cached version of their WebAssembly
module before downloading and compiling it. In V8 and
SpiderMonkey, this mechanism can offer an order of magnitude improvement of warm startup time.
5. 1. Measurements
Execution. Figure 4 shows the execution time of the
PolyBenchC benchmark suite running on WebAssembly on
both V8 and SpiderMonkey normalized to native execution.d
Times for both engines are shown as stacked bars, and the
0%
50%
100%
150%
200%
250%
2mm3mm adibicg
cholesky correlation covariance doitgendurbin dynprog fdtd-2dgemm gemver gesummv gramschmidt ludcmp lumvt seidel-2dsymmsyr2ksyrktrisolvtrmm
,
na
ti
ve
i
s
100
%
(
Lo
we
r
i
s
bet
te
r)
Execution
Difference between VMs
Validation
Compilation
VM startup
Figure 4. Relative execution time of the Poly-BenchC benchmarks on
WebAssembly normalized to native code.
0
500
1000
1500
2000
0 500 1000 1500 2000
W
eb
A
s
se
m
bl
y
s
ize
in
b
y
te
s
asm.js or native size in bytes
WebAssembly/asm.js
WebAssembly/native
0
20
40
60
80
0 20 40 60 80
Figure 5. Binary size of WebAssembly in comparison to asm.js and
native code.
d See Haas et al. 4 for details on the experimental setup for these measurements.
e V8 is faster on some benchmarks and SpiderMonkey on others. Neither
engine is universally faster than the other.