I am doing a SSAO shader with a kernel size of 64.
SSAO fragment shader:
const int kernelSize = 64;
for (int i = 0; i < kernelSize; i++) {
//Get sample position
vec3 s = tbn * ubo.kernel[i].xyz;
s = s * radius + origin;
vec4 offset = vec4(s, 1.0);
offset = ubo.projection * offset;
offset.xy /= offset.w;
offset.xy = offset.xy * 0.5 + 0.5;
float sampleDepth = texture(samplerposition, offset.xy).z;
float rangeCheck = abs(origin.z - sampleDepth) < radius ? 1.0 : 0.0;
occlusion += (sampleDepth >= s.z ? 1.0 : 0.0) * rangeCheck;
}
The samplerposition texture has the format VK_FORMAT_R16G16B16A16_SFLOAT
and is uploaded with the flag VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT
.
Im using a laptop with a nvidia K1100M graphic card. If I run the code in renderdoc, this shader takes 114 ms. And if I change the kernelSize
to 1, it takes 1 ms.
Is this texture fetch time normal? Or can it be that I have set up something wrong somewhere?
Like the layout transition did not go through, so the texture is in VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
instead of VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL
.
GPU memory relies on heavy cache usage, which is very limited if fragments close to each other do not sample texels that are next to each other - also known as a lack of spatial coherence. I would expect about 10x slowdowns or more on random access to a texture versus linear, coherent access. SSAO is very prone to this when used with large radii.
I recommend using smaller radii and optimizing the texture accesses. You're sampling 4 16 bit floats, but you're only using one. Blitting the depth to a separate 16 bit depth only image should give you an easy 4x speedup.