I am using computer shader to get a sum value(type is float) like this:
#version 320 es
layout(local_size_x = 640,local_size_y=480,local_size_z=1)
layout(binding = 0) buffer OutputData{
float sum[];
}output;
uniform sampler2D texture_1;
void main()
{
vec2 texcoord(float(gl_LocalInvocationIndex.x)/640.0f,float(gl_LocalInvocationIndex.y)/480.0f);
float val = textureLod(texture_1,texcoord,0.0).r;
//where need synchronize
sum[0] = sum[0]+val;
//Here i want to get the sum of all val in texture_1 first channal
}
I know there are atomic operations like atomicAdd(),but not support float paramater,and barrier() which doesn't seem to solve my problem. Maybe i can encord the float to int,or is there some simple way to solve my problem?
Atomics are generally very poor in terms of performance, especially if heavily contended by parallel access from lots of threads, so I wouldn't recommend them for this use case.
To keep parallelism here you really need some kind of multi-pass reduction strategy. Pseudo code, something like this:
array_size = N
data = input_array
while array_size > 1:
spawn pass with M = array_size/2 threads.
thread M: out[M] = data[2*M] + data[2*M+1]
array_size = M
data = out
This is a simple 2:1 reduction, so gives O(log2(N)) performance, but you could do more reduction per pass to reduce memory bandwidth of the intermediate storage. For a GPU using textures as input 4:1 is quite nice (you can use textureGather or even a simple linear filter to load multiple samples in a single texturing operation).