Search code examples
rustmetalapple-siliconwgpu-rs

Geometry vanishes intermittently with multiple draw calls on Metal using wgpu (Intel and Apple Silicon, Integrated GPU)


TL;DR

I suspect I'm not using wgpu buffer offsets correctly. But why does wgpu state that the minimum uniform buffer offset is 256, when it is clearly 64 (the size of the data I'm uploading)? Can you help me better understand uniform buffer offsets? Or can you maybe point out mistakes I'm overlooking? I'd be grateful for your insights, because I'm at a loss.

Background

Based on the "Learn WGPU" tutorial by sotrh, I've been piecing together my own basic renderer. Currently, I'm relying on two render pipelines, a single uniform buffer write per frame with offsets and two draw calls within the same render pass. The only things I'm rendering are a quad and a triangle. The triangle is always drawn after the quad and the objects don't overlap.

Problem Statement

What I expect, is that the yellow triangle renders next to the yellow quad on a teal background. a yellow triangle renders next to a yellow quad on a teal background But what I get instead, is the triangle flickering in and out of existence very rapidly and irregularly.

Steps to Diagnose the Problem

I've gone ahead and created a Metal frame capture in XCode, and you can see that the buffers are attached correctly, but the clip position matrix of the triangle is all zeroes. Furthermore, the highlighted transform-buffer offset is not what I expect (I'm at 0x100 but should be at 0x40). This led me to believe that I'm doing something wrong with the uniform buffer offsets.

Update: additional testing has confirmed that the order of draw calls (quad before triangle) is always the same, and the uniform offset is also always 0 for the quad, and 0x100 for the triangle.

transform-buffer offsets and values

Buffer Usage Code

// ...
// Bind group and buffer layout definition
let min_binding_size = wgpu::BufferSize::new(std::mem::size_of::<Mat4<f32>>() as _);  // 64 bytes
let transform_layout = BindGroupLayoutBuilder::new(&runtime, &mut database)
    .with_label("transform-layout")
    .add_bind_group_layout_entry(
        0,
        wgpu::ShaderStages::VERTEX,
        wgpu::BindingType::Buffer {
            ty: wgpu::BufferBindingType::Uniform,
            has_dynamic_offset: true,
            min_binding_size,
        },
    )
    .submit();
// ...
// Transform-buffer creation
let max_objects = gfx.limits().max_uniform_buffer_binding_size / gfx.limits().min_uniform_buffer_offset_alignment; // 256
let uniform_alignment = gfx.limits().min_uniform_buffer_offset_alignment as wgpu::BufferAddress; // 256 bytes
let buffer_size = (max_objects as wgpu::BufferAddress) * uniform_alignment; // 65536 bytes
let transform_buffer = gfx.create_buffer(
    Some("transform-buffer"),
    buffer_size,
    wgpu::BufferUsages::UNIFORM | wgpu::BufferUsages::COPY_DST,
);
// ...
// Transform-buffer update
let uniform_alignment = gfx.limits().min_uniform_buffer_offset_alignment;  // 256 bytes
gfx.write_buffer(self.transform_buffer, unsafe {
    std::slice::from_raw_parts(
        transforms.as_ptr() as *const u8,
        transforms.len() * uniform_alignment as usize,
    )
});
// ...
// Draw calls
for (i, r) in renderables.into_iter().enumerate() {
    let transform_offset = (i as wgpu::DynamicOffset) * (uniform_alignment as wgpu::DynamicOffset);  // first 0x0, then 0x100
    if r.0.materials.is_empty() {
        rp.set_pipeline(self.pipeline_wt)
            .set_bind_group(0, self.transform_bind_group, &[transform_offset])
            .set_vertex_buffer(0, r.0.mesh.vertex_buffer)
            .set_index_buffer(r.0.mesh.index_buffer)
            .draw_indexed(0..r.0.mesh.num_indices, 0, 0..1);
    } else {
        rp.set_pipeline(self.pipeline_wtm)
            .set_bind_group(0, self.transform_bind_group, &[transform_offset])
            .set_bind_group(1, r.0.materials[0].bind_group, &[])
            .set_vertex_buffer(0, r.0.mesh.vertex_buffer)
            .set_index_buffer(r.0.mesh.index_buffer)
            .draw_indexed(0..r.0.mesh.num_indices, 0, 0..1);
    }
}
// ...

Solution

  • I ended up finding my own solution. My understanding of dynamic buffer offsets was halfway correct. What I had forgotten was that the rust struct that is to be written to the buffer needs to be itself aligned to 256 bytes. So all I had to add was #[repr(C, align(256))] to my struct declaration.