I'm trying to write to a SSBO with a compute shader and read the data back on the cpu.
The compute shader is just a 1x1x1 toy example that writes 24 floats:
#version 450 core
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
layout (std430, binding = 0) buffer particles {
float Particle[];
};
void main() {
for (int i = 0; i < 24; ++i) {
Particle[i] = i + 1;
}
}
This is how I run the shader and read the data:
val bufferFlags = GL_MAP_READ_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_COHERENT_BIT
val bufferSize = 24 * 4
val bufferId = glCreateBuffers()
glNamedBufferStorage(bufferId, bufferSize, bufferFlags)
val mappedBuffer = glMapNamedBufferRange(bufferId, 0, bufferSize, bufferFlags)
mappedBuffer.rewind()
val mappedFloatBuffer = mappedBuffer.asFloatBuffer()
mappedFloatBuffer.rewind()
val ssboIndex = glGetProgramResourceIndex(progId, GL_SHADER_STORAGE_BLOCK, "particles")
val props = Array(GL_BUFFER_BINDING)
val params = Array(-1)
glGetProgramResourceiv(progId, GL_SHADER_STORAGE_BLOCK, ssboIndex, props, null, params)
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, params(0), bufferId)
glUseProgram(progId)
val sync = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0)
glDispatchCompute(1, 1, 1)
glClientWaitSync(sync, 0, 1000000000) match {
case GL_TIMEOUT_EXPIRED =>
println("Timeout expired")
case GL_WAIT_FAILED =>
println("Wait failed. " + glGetError())
case _ =>
println("Result:")
while(mappedFloatBuffer.hasRemaining) {
println(mappedFloatBuffer.get())
}
}
I expect it to print the numbers 1 to 24 but instead it prints 24 zeros. Using the cpu I can read and write (if the GL_MAP_WRITE_BIT
is set) to the buffer just fine. The same happens if I don't use DSA (glBindBuffer
/ glBufferStorage
/ glMapBufferRange
instead). However, if the buffer isn't mapped while the shader runs and I only map it just before printing the contents, everything works correctly. Isn't this exactly what persistently mapped buffers are for? So I can keep it mapped while the gpu is using it?
I checked for any errors, with glGetError
as well as with the newer debug output but I don't get any.
Here (pastebin) is a fully working example. You need LWJGL to run it.
There are a number of problems in your code.
First, you put the fence sync before the command you want to sync with. Syncing with a fence syncs with all commands executed before the fence, not after. If you want to sync with the compute shader execution, then you have to insert the fence after the dispatch call, not before.
Second, synchronization is not enough. Writes to an SSBO are incoherent, so you must follow the rules of incoherent memory accesses in order to make them visible to you. In this case, you need to insert an appropriate memory barrier between the compute operation and when you try to read from the buffer with glMemoryBarrier
. Since you're reading the data via mapping, the proper barrier to use is GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT
.
Your code appears to work when you use non-persistent mapping, but this is mere appearance. It's still undefined behavior due to the improper incoherent memory access (ie: the lack of a memory barrier). It just so happens that the UB does what you want... in that case.