I am trying to implement an in-place falling sand algorithm, let's say i have a 3D texture and i want all spots that are not zero to move one spot down if that spot is empty (0). That is easy to do:
Let's say 'from' is xyz, 'to' is xyz + (0, -1, 0)
uint toValueWas;
InterlockedCompareExchange(_texture[to], 0, _texture[from], toValueWas);
if (toValueWas == 0)
{
_texture[from] = toValueWas;
}
This works perfectly, but what if i need more storage than one uint? Let's say i want to have another texture b that will give me extra storage and which moves i want to keep in sync with the original texture. I have tried every which way to do this, but the two textures/buffers always go out of sync. For example this doesn't work:
uint toValueWas;
InterlockedCompareExchange(_texture[to], 0, _texture[from], toValueWas);
if (toValueWas == 0)
{
_texture[from] = toValueWas;
// Goes out of sync with _texture:
InterlockedExchange(_textureB[to], _textureB[from], _textureB[from]);
}
The question boils down to: Is there a way to effectively do atomic swaps on more than 32 bits?
I have expected to find any way to atomically swap more than 32 bits of data but was unable to find it.
https://forum.unity.com/threads/atomic-swap-on-more-than-one-number.1410948/
Okay so after about a week of bashing my head against this I think i finally fixed it by having a bit in texture A that says "has this one been moved yet?", reset in a different kernel at start of every update. Basically restricting the atomic moves to one move per kernel execution, removing the possibility of following construct occurring which complicates things massively: (imagine were swapping everything one index to the right if its empty) thread 0 happens to get executed first: swaps [0] and [1], then thread 1 happens to get executed next: picks up right after and sees that it already can do yet another swap from [1] to [2], etc. Final working code is:
uint _textureFrom = _texture[from];
if (bit(_textureFrom, BEEN_MOVED_BIT) == 1) return;
_textureFrom = setbit(_textureFrom, BEEN_MOVED_BIT, 1);
uint toValueWas;
InterlockedCompareExchange(_texture[_to], 0, _textureFrom, toValueWas);
if (toValueWas == 0)
{
_texture[from] = 0;
uint __;
InterlockedExchange(_textureB[to], _textureB[from], __);
}
My guess at the source of the desync was that while the Interlocked* functions lines are indeed perfectly atomic, everything else is not atomic... idk, the more i think about it the more it still doesn't really make sense, but this works. At the very least I can say that one of the keys to the desync happening definitely was the situation where more than one swap was happening in quick succession.