Search code examples
javascriptwebgpuwgsl

Padding in WGSL memory layout


I'm trying to write a WGSL structs parser (sort of webgpu-utils thing). In order to better understand the memory layout, I'm using wgsl offset computer as a helper.

Having the next struct:

struct A {
  a: array<vec3f, 3>,
  b: f32
};

The layout given by the mentioned tool looks like: enter image description here

I'm struggling with the difference between the graphical representation and the AInfo object. In the graphics, it is clearly seen that each item in the array<vec3f, 3> has alignment padding of 4 bytes and the offsets for three items are 0, 16 and 32 respectively. On the other hand, in the AInfo object the offset of the array a is zero and the length is 48 bytes. Referencing to the graphics, I would expect the object representation to be sort of:

const AInfo = {
  a: [
    {type: Float32Array, byteOffset: 0, length: 4}, // 16 bytes
    {type: Float32Array, byteOffset: 16, length: 4}, // 16 bytes
    {type: Float32Array, byteOffset: 32, length: 4}, // 16 bytes
  ],
  b: {type: Float32Array, byteOffset: 48, length: 1}
};

What am I missing here?


Solution

  • You almost never want an array of vec3's in JavaScript. It's too small of a unit so if you do this array<vec3, 10000> to really don't want it to create 10000 Float32Array views. So, the offset computer, or rather the webgpu-utils it's based on, doesn't create individual views for elements of base types because that would waste so much memory and be so slow and wasteful as to be useless.

    Imagine if array<f32, 10> made

    [
       new Float32Array(ab, 0, 1),
       new Float32Array(ab, 4, 1),
       new Float32Array(ab, 8, 1),
       new Float32Array(ab, 12, 1),
       new Float32Array(ab, 16, 1),
       new Float32Array(ab, 20, 1),
       new Float32Array(ab, 24, 1),
       new Float32Array(ab, 28, 1),
       new Float32Array(ab, 32, 1),
       new Float32Array(ab, 36, 1),
    ]
    

    Those views are proably 128-256 bytes each, maybe more. So you'd end up using multiple kilobytes of memory to represent 10 f32s.

    See: https://github.com/greggman/webgpu-utils/tree/dev#the-first-level-of-an-array-of-intrinsic-types-is-flattened

    I can add an option to not flatten those but it's arguably bad practice. Technically a mat4x4f is an array of 4 vec4fs but I doubt anyone wants a mat4x4f representing by 4 views.