Search code examples
multithreadingrustwebassemblymemory-corruptionrust-wasm

How to compile Rust for use with WASM's Shared Memory?


When I run a loop in different Web Workers, the loop shares the counter variable across threads despite that the variable should be thread-local. It should not do this, but I don't know how to fix it.

The offending loop is in the run function, as follows in the Rust code being compiled to WASM:

#![no_main]
#![no_std]

use core::panic::PanicInfo;
use js::*;

mod js {
    #[link(wasm_import_module = "imports")]
    extern "C" {
        pub fn abort(msgPtr: usize, filePtr: usize, line: u32, column: u32) -> !;
        pub fn _log_num(number: usize);
    }
}

#[no_mangle]
pub unsafe extern "C" fn run(worker_id: i32) {
    let worker_index = worker_id as u32 - 1;
    let chunk_start = 100 * worker_index;
    let chunk_end = chunk_start + 100; //Total pixels may not divide evenly into number of worker cores.
    for n in chunk_start as usize..chunk_end as usize {
        _log_num(n);
    }
}

#[panic_handler]
unsafe fn panic(_: &PanicInfo) -> ! { abort(0, 0, 0, 0) }

run is passed the thread id, ranging from 1 to 3 inclusive, and prints out a hundred numbers - so all three threads should log the numbers 0 to 299, albeit in mixed order. I expect to see 1, 2, 3... from thread 1, 101, 102, 103... from thread 2, and 201, 202, 203 from thread 3. If I run the functions sequentially, that is indeed what I see. But if I run them in parallel, I get each thread helping each other thread, so they'll log something like 1, 4, 7 ... on the first thread, 2, 6, 9 on the second, and 3, 5, 8 on the third thread; up to 99, where all three threads will stop. Each thread is behaving like it is sharing chunk_start, chunk_end, and n with the other threads.

It should not do this, because .cargo/config.toml specifies --shared-memory so the compiler should use the appropriate locking mechanisms when allocating memory.

[target.wasm32-unknown-unknown]
rustflags = [
    "-C", "target-feature=+atomics,+mutable-globals,+bulk-memory",
    "-C", "link-args=--no-entry --shared-memory --import-memory --max-memory=2130706432",
]

I know this is being picked up, because if I change the --shared-memory flag to something else, rust-lld complains it does not know what it is.

wasm-bindgen's parallel demo works fine, so I know it's possible to do this. I just can't spot what they've set to make theirs work.

Perhaps it is something in the way I load my module in the web worker?

const wasmSource = fetch("sim.wasm") //kick off the request now, we're going to need it

//See message sending code for why we use multiple messages.
let messageArgQueue = [];
addEventListener("message", ({data}) => {
    messageArgQueue.push(data)
    if (messageArgQueue.length === 4) {
        self[messageArgQueue[0]].apply(0, messageArgQueue.slice(1))
    }
})

self.start = async (workerID, worldBackingBuffer, world) => {
    const wasm = await WebAssembly.instantiateStreaming(wasmSource, {
        env: { memory: worldBackingBuffer },
        imports: {
            abort: (messagePtr, locationPtr, row, column) => {
                throw new Error(`? (?:${row}:${column}, thread ${workerID})`)
            },
            _log_num: num => console.log(`thread ${workerID}: n is ${num}`),
        },
    })

    //Initialise thread-local storage, so we get separate stacks for our local variables.
    wasm.instance.exports.__wasm_init_tls(workerID-1)   

    //Loop, running the Rust logging loop when the "tick" advances.
    let lastProcessedTick = 0
    while (1) {
        Atomics.wait(world.globalTick, 0, lastProcessedTick)
        lastProcessedTick = world.globalTick[0]
        wasm.instance.exports.run(workerID)
    }
}

worldBackingBuffer here is the shared memory for the WASM module, and it's created in the main thread.

//Let's count to 300. We'll have three web workers, each taking ⅓rd of the task. 0-100, 100-200, 200-300...

//First, allocate some shared memory. (The original task wants to share some values around.)
const memory = new WebAssembly.Memory({
    initial: 23,
    maximum: 23,
    shared: true,
})

//Then, allocate the data views into the memory.
//This is shared memory which will get updated by the worker threads, off the main thread.
const world = {
    globalTick: new Int32Array(memory.buffer, 1200000, 1), //Current global tick. Increment to tell the workers to count up in scratchA!
}

//Load a core and send the "start" event to it.
const startAWorkerCore = coreIndex => {
    const worker = new Worker('worker/sim.mjs', {type:'module'})
    ;['start', coreIndex+1, memory, world].forEach(arg => worker.postMessage(arg)) //Marshal the "start" message across multiple postMessages because of the following bugs: 1. Must transfer memory BEFORE world. https://bugs.chromium.org/p/chromium/issues/detail?id=1421524 2. Must transfer world BEFORE memory. https://bugzilla.mozilla.org/show_bug.cgi?id=1821582
}

//Now, let's start some worker threads! They will work on different memory locations, so they don't conflict.
startAWorkerCore(0) //works fine
startAWorkerCore(1) //breaks counting - COMMENT THIS OUT TO FIX COUNTING
startAWorkerCore(2) //breaks counting - COMMENT THIS OUT TO FIX COUNTING


//Run the simulation thrice. Each thread should print a hundred numbers in order, thrice.
//For thread 1, it should print 0, then 1, then 2, etc. up to 99.
//Thread 2 should run from 100 to 199, and thread 3 200 to 299.
//But when they're run simultaneously, all three threads seem to use the same counter.
setTimeout(tick, 500)
setTimeout(tick, 700)
setTimeout(tick, 900)
function tick() {
    Atomics.add(world.globalTick, 0, 1)
    Atomics.notify(world.globalTick, 0)
}

But this looks pretty normal. Why am I seeing memory corruption in my Rust for-loop?


Solution

  • there is some magic being done in wasm-bindgen - the start is replaced/injected with code fixing memory. Although there seem to be issues with it -

    https://github.com/rustwasm/wasm-bindgen/discussions/3474

    https://github.com/rustwasm/wasm-bindgen/discussions/3487