WGSL atomics with multiple compute passes

I'm having an issue with atomics in wgpu / WGSL but I'm not sure if it's due to a fundamental misunderstanding or a bug in my code.

I have a input array declared in WGSL as

struct FourTileUpdate {
  // (u32 = 4 bytes)
  data: array<u32, 9>
};

@group(0) @binding(0) var<storage, read> tile_updates : array<FourTileUpdate>;

I'm limiting the size of this array to around 5MB, but sometimes I need to transfer more than that for a single frame and so use multiple command encoders & compute passes.

Each "tile update" has an associated position (x & y) and a ms_since_epoch property that indicates when the tile update was created. Tile updates get written to a texture.

I don't want to overwrite newer tile updates with older tile updates, so in my shader I have a guard:

storageBarrier();
let previous_timestamp_value = atomicMax(&last_timestamp_for_tile[x + y * r_locals.width], ms_since_epoch);
if (previous_timestamp_value > ms_since_epoch) {
  return;
}

However, something is going wrong and older tile updates are overwriting newer tile updates. I can't reproduce this on Windows / Vulkan but it consistently happens on macOS / Metal. Here's an image of the rendered texture--it should be completely green instead of the occasional red and black pixel:

rendered texture

A few questions:

is execution order guaranteed to be the same as the order of the command encoder constructions?
do storageBarrier() and atomics work across all invocations in a single frame or just the compute pass?

I tried submitting each encoder with queue.submit(Some(encoder.finish())) before creating the next encoder for the frame, and even waiting for the queue to finish processing for each submitted encoder with

let (tx, rx) = mpsc::channel();
queue.on_submitted_work_done(move || {
  tx.send().unwrap();
});
device.poll(wgpu::Maintain::Wait);
rx.rev().unwrap()

// ... loop back and create & submit next encoder for current frame

but that didn't work either.

Solution

Good questions!

is execution order guaranteed to be the same as the order of the command encoder constructions?

I believe that is the case. But I checked and the spec is actually unclear about this. I filed https://github.com/gpuweb/gpuweb/issues/3809 to fix this.

Further, I believe the intent is that all memory accesses (e.g. to storage buffers) from one GPU command will complete before the next GPU command begins. So the effect of any writes in one command will be visible in the next command (read-after-write hazard). Also, a write in a later command will not be visible in an earlier command (write-after-read hazard).

do storageBarrier() and atomics work across all invocations in a single frame or just the compute pass?

Another good question. storageBarrier() only works within a single workgroup. This may be surprising, but is due to a limitation in some platforms.

For further details, see https://github.com/gpuweb/gpuweb/issues/3774
This will be a FAQ because it is surprising, and subtle!

Update: I suspect the bad behaviour you're seeing is that storageBarrier() does not work across workgroups. It's a limitation in Metal.