I am trying to write a HLSL compute shader that writes to the red channel in a RWTexture2 but I am seeing inconsistencies, when I try to read the value, all depending on whether I placed the assignment before or after an arbitrary if-statement:
Concrete example. I am only dispatching a single thread shader.Dispatch(1, 1, 1)
When reading from the AppendStructureBuffer after it has been dispatched the following code returns a single entry with a value of 255.
RWTexture2D<float4> gridModel;
AppendStructuredBuffer<float> outlierIndeces;
[numthreads(1, 1, 1)]
void IdentifyBorderPoints(uint3 id : SV_DispatchThreadID)
{
if (id.x == 0)
{
gridModel[id.xy] = float4(255, 0, 0, 0);
outlierIndeces.Append(gridModel[id.xy].r);
}
}
But moving the assignment outside of the if-statement makes it return 1 instead. However, from what I can see, this should still just 255:
RWTexture2D<float4> gridModel;
AppendStructuredBuffer<float> outlierIndeces;
[numthreads(1, 1, 1)]
void IdentifyBorderPoints(uint3 id : SV_DispatchThreadID)
{
gridModel[id.xy] = float4(255, 0, 0, 0);
if (id.x == 0)
{
outlierIndeces.Append(gridModel[id.xy].r);
}
}
As a sanity check, commenting out the assignment makes it return 0:
RWTexture2D<float4> gridModel;
AppendStructuredBuffer<float> outlierIndeces;
[numthreads(1, 1, 1)]
void IdentifyBorderPoints(uint3 id : SV_DispatchThreadID)
{
//gridModel[id.xy] = float4(255, 0, 0, 0);
if (id.x == 0)
{
outlierIndeces.Append(gridModel[id.xy].r);
}
}
I dwelled deeper into the compiled code and there I am also seeing a difference: The following compiled code is from the example where the assignment happens before the if-statement:
// Input signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// no Input
//
// Output signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// no Output
cs_5_0
dcl_globalFlags refactoringAllowed
dcl_uav_structured u0, 4
dcl_uav_typed_texture2d (float,float,float,float) u1
dcl_input vThreadID.xy
dcl_temps 2
dcl_thread_group 1, 1, 1
0: store_uav_typed u1.xyzw, vThreadID.xyyy, l(255.000000,0,0,0)
1: if_z vThreadID.x
2: mov r0.x, l(0)
3: mov r0.yzw, vThreadID.yyyy
4: ld_uav_typed_indexable(texture2d)(float,float,float,float) r0.x, r0.xyzw, u1.xyzw
5: imm_atomic_alloc r1.x, u0
6: store_structured u0.x, r1.x, l(0), r0.x
7: endif
8: ret
// Approximately 0 instruction slots used
And this is the compiled code in the example where the assignment happens after the if-statement:
// Input signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// no Input
//
// Output signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// no Output
cs_5_0
dcl_globalFlags refactoringAllowed
dcl_uav_structured u0, 4
dcl_uav_typed_texture2d (float,float,float,float) u1
dcl_input vThreadID.xy
dcl_temps 1
dcl_thread_group 1, 1, 1
0: if_z vThreadID.x
1: mov r0.x, l(0)
2: mov r0.yzw, vThreadID.yyyy
3: store_uav_typed u1.xyzw, r0.xyzw, l(255.000000,0,0,0)
4: imm_atomic_alloc r0.x, u0
5: store_structured u0.x, r0.x, l(0), l(255.000000)
6: endif
7: ret
// Approximately 0 instruction slots used
I don't fully understand the compile code, but it seems like the value 255 is assigned to the structured buffy only in the case where the assignment happens after the if-statement.
Additional info: I am using Unity's shader compiler to compile the code. The version of Unity I am using is 2018.3.14f1.
I received some assistance and we figured out the problem. In Unity I initialized gridModel using a RenderTexture with the following settings:
texture = new RenderTexture(512, 512, 1, RenderTextureFormat.ARGB32);
texture.depth = 0;
texture.enableRandomWrite = true;
texture.Create();
And the I send it to the shader:
int identifyBorderPointsKernel = cShader.FindKernel("IdentifyBorderPoints");
cShader.SetTexture(identifyBorderPointsKernel, "gridModel", texture);
But the problem occured because each channel in RenderTextureFormat.ARGB32 is an 8 bit float chhanel and therefore clamped my int to 1. By replacing the 255 with a float value instead 0.4, I got equal regardless of where I did the assignment.
The reason I received a difference when placed on either side of the if-statement is because in the case where the assignment happend inside the if-statement the compiler assigned the value directly to the appendbuffer and didn't bother to load from the texture first hence the value was never clamped.