SIMD instruction emulation in WebAssembly?

I am investigating slowness in a WebAssembly project, and I wonder if SIMD instructions are being emulated somehow. Here's a toy Rust library to exercise some SIMD operations:

use core::arch::wasm32::*;

#[no_mangle]
pub fn do_something(f: f32) -> f32 {
    let f4 = f32x4_splat(f);
    let mut a = f4;
    for _ in 0..100000 {
        a = f32x4_add(a, f4);
    }
    f32x4_extract_lane::<0>(a)
        + f32x4_extract_lane::<1>(a)
        + f32x4_extract_lane::<2>(a)
        + f32x4_extract_lane::<3>(a)
}

Then I build it with cargo build --release --target wasm32-unknown-unknown.

Finally I run it with:

      const response = await fetch(WASM_FILE);
      const wasmBuffer = await response.arrayBuffer();
      const wasmObj = await WebAssembly.instantiate(wasmBuffer, {env:{}});
      function do_something() {
        wasmObj.instance.exports.do_something(0.00001);
        requestAnimationFrame(do_something);
      }
      requestAnimationFrame(do_something);

I suspect that the SIMD operations are being emulated, because I see this is the Chrome performance Call Tree:

If the SIMD operations were being lowered to a single instruction--like I would expect--I wouldn't expect to see anything with f32x4_add in the profile trace.

Solution

It's a well known pitfall that if you don't enable the appropriate target_feature for SIMD functions, they're not inlined, causing a major overhead. It's even documented.

The solution is to turn the simd128 target feature on. You can do it by setting RUSTFLAGS (for example, in .cargo/config.toml), and pass the argument -C target-feature=+simd128 to rustc.