results show that there are still some differences between
them due to different code generators.e We measured a VM
startup time of 18 ms for V8 and 30ms for SpiderMonkey.
These times are included along with compilation times as
bars stacked on top of the execution time of each benchmark. Overall, the results show that WebAssembly is competitive with native code, with seven benchmarks within
10% of native and nearly all of them within 2× of native.
We also measured the execution time of the PolyBenchC
benchmarks running on asm.js. On average, WebAssembly is
33.7% faster than asm.js. Especially validation is significantly
more efficient. For SpiderMonkey, WebAssembly validation
takes less than 3% of the time of asm.js validation. In V8,
memory consumption of WebAssembly validation is less
than 1% of that for asm.js validation.
Code size. Figure 5 compares code sizes between
WebAssembly, minified asm.js, and × 86-64 native code. For
the asm.js comparison we use the Unity benchmarks, for the
native code comparison the PolyBenchC and SciMark
benchmarks. For each function in these benchmarks, a
yellow point is plotted at 〈sizeasmjs, size wasm〉 and a blue point at
〈sizex86, sizewasm〉. Any point below the diagonal represents code
for which WebAssembly is smaller than the corresponding
other representation. On average, WebAssembly code is
62.5% the size of asm.js, and 85.3% of native × 86-64 code.
6. RELATED WORK
Microsoft’s ActiveX was a technology for code-signing × 86
and more efficient and avoiding the limitation of JITs that
usually do not support irreducible control flow. Reusing the
been a resounding success that allowed all engines to
achieve high performance in a short time.
Bounds checks. By design, all memory accesses in
WebAssembly can be guaranteed safe with a single dynamic
bounds check, which amounts to checking the address
against the current size of the memory. An engine will allocate the memory in a large contiguous range beginning at
some (possibly non-deterministic) base in the engine’s
process, so that all access amounts to a hardware address
base+addr. While base can be stored in a dedicated machine
register for quick access, a more aggressive strategy is to
specialize the machine code for each instance to a specific
base, embedding it as a constant directly into the code,
freeing a register. Although the base may change when the
memory is grown dynamically, it changes so infrequently that it
is affordable to patch the machine code when it does.
On 64 bit platforms, an engine can make use of virtual
memory to eliminate bounds checks for memory accesses
altogether. The engine simply reserves 8GB of virtual address
space and marks as inaccessible all pages except the valid portion of memory near the start. Since WebAssembly memory
addresses and offsets are 32 bit integers plus a static constant,
by construction no access can be further than 8GB away from
base. Consequently, the JIT can simply emit plain load/store
instructions and rely on hardware protection mechanisms to
catch out-of-bounds accesses.
Parallel and streaming compilation. With ahead-of-time
compilation it is a clear performance win to parallelize
compilation of WebAssembly modules, dispatching individual functions to different threads. For example, both V8
and SpiderMonkey achieve a 5-6× improvement in compilation speed with eight compilation threads. In addition, the
design of the WebAssembly binary format supports
streaming where an engine can start compilation of individual
functions before the full binary has been loaded. When
combined with parallelization, this minimizes cold startup.
Code caching Besides cold startup, warm startup time is
important as users will likely visit the same Web pages
modules and store their compiled representation as an
query IndexedDB for a cached version of their WebAssembly
module before downloading and compiling it. In V8 and
SpiderMonkey, this mechanism can offer an order of magnitude improvement of warm startup time.
5. 1. Measurements
Execution. Figure 4 shows the execution time of the
PolyBenchC benchmark suite running on WebAssembly on
both V8 and SpiderMonkey normalized to native execution.d
Times for both engines are shown as stacked bars, and the
cholesky correlation covariance doitgendurbin dynprog fdtd-2dgemm gemver gesummv gramschmidt ludcmp lumvt seidel-2dsymmsyr2ksyrktrisolvtrmm
Difference between VMs
Figure 4. Relative execution time of the Poly-BenchC benchmarks on
WebAssembly normalized to native code.
0 500 1000 1500 2000
asm.js or native size in bytes
0 20 40 60 80
Figure 5. Binary size of WebAssembly in comparison to asm.js and
d See Haas et al. 4 for details on the experimental setup for these measurements.
e V8 is faster on some benchmarks and SpiderMonkey on others. Neither
engine is universally faster than the other.