Why does order matter in shaders?

A Quick Note

This question has the C++ tag because there are more developers working with DirectX in C++ than there are in C#. I don't believe this question is directly related to either language, but instead to the types used (which as I understand it are exactly the same), or DirectX itself and how it compiles shaders. If someone working in C++ knows a better and more descriptive answer, then I would prefer that over my own answer. I understand both languages but use C# primarily.

Overview

In an HLSL shader, when setting up my constant buffers I ran into an issue that was rather peculiar. The original constant buffers in question were set up as follows:

cbuffer ObjectBuffer : register(b0) {
    float4x4 WorldViewProjection;
    float4x4 World;
    float4x4 WorldInverseTranspose;
}

cbuffer ViewBuffer : register(b1) {
    DirectionalLight Light;
    float3 CameraPosition;
    float3 CameraUp;
    float2 RenderTargetSize;
}

If I swap the b0 and b1 registers around, rendering no longer works (e1). If I leave those registers alone, and swap the order between World and WorldViewProjection again, rendering no longer works (e2). However, simply moving the ViewBuffer above the ObjectBuffer in the HLSL file without making other modifications, it works just fine.

Now, I expect the register placement is rather important and that the first register b0 requires the three properties given in that buffer, and I understand that HLSL constant buffers are required to be in 16 byte chunks. However, this leaves me with some questions.

Questions

Given the fact that HLSL expects constant buffers to be in 16 byte chunks;

Why does the ordering in e2 matter so much?

Aren't float4x4 types the same as Matrix types where it's essentially an array of arrays?

[ 0, 0, 0, 0 ] = 16 bytes
[ 0, 0, 0, 0 ] = 16 bytes
[ 0, 0, 0, 0 ] = 16 bytes
[ 0, 0, 0, 0 ] = 16 bytes
[    TOTAL   ] = 64 bytes

Since a float is 4 bytes on its own, this would mean a float4 is 16 bytes, and thus a float4x4 is 64 bytes. So why does the order matter if the size remained the same?

Why does the ObjectBuffer have to be assigned to b0 in this case instead of any other b register?

Solution

A Quick Note

I am currently working on further analysis of the problem so that I can give a more detailed and accurate answer. I will update the question and answer to reflect as much accuracy as possible as I discover more details.

Basic Answer

The exact issue with the question above (which was unknown at the time of posting), is that the HLSL buffers didn't match their C# representations; thus the reordering of variables caused the shader to fail. However, I am still unsure as to why when the types are the same. I have learned of some other things along my road for an answer and have decided to post them here.

Why Order Matters

After some further research and testing, I'm still not 100% sure as to the reasoning behind this where types are all the same. Overall, I believe it may be due to the expected types in the cbuffer and the order of the types in the struct. In this case, if your cbuffer expects a bool first and then a float, then rearranging causes issues.

cbuffer MaterialBuffer : register(b0) {
    bool HasTexture;
    float SpecularPower;
    float4 Ambient;
    ...
}
// Won't work.
public struct MaterialBuffer {
    public float SpecularPower;
    public Vector2 padding2;
    public bool HasTexture;
    private bool padding0;
    private short padding1;
    public Color4 Ambient;
    ...
}
// Works.
public struct MaterialBuffer {
    public bool HasTexture;
    private bool padding0;
    private short padding1;
    public float SpecularPower;
    public Vector2 padding2;
    public Color4 Ambient;
    ...
}

I put some research effort into testing the differences in byte size of types and this didn't really seem to change much of anything but I'll post my findings for common basic types here:

1 Byte  : bool, sbyte, byte
2 Bytes : short, ushort
4 Bytes : int, uint, float
8 Bytes : long, ulong, double
16 Bytes: decimal

You do have to be conscious of the basic types used to construct more complex types. Say for example you have a Vector2 with an X property and a Y property. If those are represented by float types then you'll need an 8 byte padding prior to the next property unless you have some other stuff to help reach 16 bytes. However, if those are represented by double types or decimal types then the size is different and you'll need to be aware of that.

Register Assignments

I was able to solve the register issue; this also corresponds to the C# side when you set the buffers. As you set the buffers, you assign indices to those buffers and the HLSL is expected to use the same indices.

// Buffer declarations in HLSL.
cbuffer ViewBuffer : register(b0)
cbuffer CameraBuffer : register(b1);
cbuffer MaterialBuffer : register(b2);

// Buffer assignments in C#.
context.VertexShader.SetConstantBuffer(0, viewBuffer);
context.VertexShader.SetConstantBuffer(1, cameraBuffer);
context.VertexShader.SetConstantBuffer(2, materialBuffer);

The above code will work as expected since the buffers are assigned to the correct registers. However, if we change the buffer for the camera to 8 for example, then the cbuffer would have to be assigned to register b8 in order to work properly. The code below doesn't work for that exact reason.

cbuffer CameraBuffer : register(b1)
context.VertexShader.SetConstantBuffer(8, cameraBuffer);