How to convert global_invocation_id to clip space coordinates? (To convert depth buffer to world space position)

I am writing a compute shader that runs some function for every pixel in a framebuffer. To simplify this question, I'm using a workgroup size of 1.

@compute
@workgroup_size(1, 1)
fn main(@builtin(global_invocation_id) gid: vec3u) { ... }

Dispatching via pass.dispatch_workgroups(screen_width, screen_height, 1).

That means my main function is called with gid = (x, y, 1) for all 0 <= x < screen_width and 0 <= y < screen_height. Specifically, the bottom right pixel will have gid = (screen_width - 1, screen_height - 1, 1), right?

I have a depth buffer from a previous pass. I want to convert those depth values to world coordinates. And for that I need to convert gid into clip space coordinates. Here is what I want to do:

let depth = textureLoad(depth_buffer, gid.xy, 0);
let clip_space = ?????;
let world_space_hc = inverse_view_proj_matrix * vec4(clip_space, depth, 1.0);
let world_space = world_space_hc.xyz / world_space_hc.w;

What to write in place of ???? is my question! My naive approach was:

let uv = vec2f(gid.xy) / screen_size;
let clip_space = vec2(uv.x, 1.0 - uv.y) * 2.0 - 1.0;

The second line is correct, I think. And this seems to work. But I am worried that this is slightly incorrect. In particular, the bottom right pixel with (screen_width - 1, screen_height - 1) will NOT map to (1, 1) with this algorithm. And that seems wrong.

I've been reading the WebGPU spec (in particular the coordinate system rasterization parts), but I'm still not 100% clear. I think that corner pixels "look along" edges of the view frustum exactly. This would mean the correct conversion is:

let uv = vec2f(gid.xy) / (screen_size - vec2f(1.0);

Is that correct?

A third option I could see is:

let uv = (vec2f(gid.xy) + vec2f(0.5)) / screen_size;

I'd greatly appreciate an answer finally explaining this to me.

Solution

I think it's easiler to visualize texture coordinates (And clip space coordinates) with a small example. Imagine you have a 3x2 texture (or canvas)

These are the clip space coordinates (assuming your viewport setting matches the texture size (the default)


                        +1,+1
  +-------+-------+-------+
  |       |       |       |
  |       |       |       |
  |       |       |       |
  +-------+-------+-------+
  |       |       |       |
  |       |       |       |
  |       |       |       |
  +-------+-------+-------+
-1, -1,

Looking further, the coordinates of each texel are

  clipSpace = (texelPosition + 0.5) / textureSize * 2 - 1


                        +1,+1
  +-------+-------+-------+     
  |       |       |       |   
  |   d   |   e   |   f   |    f = +0.666..., +0.5
  |       |       |       |    e =  0       , +0.5
  +-------+-------+-------+    d = -0.666..., +0.5
  |       |       |       |    c = +0.666..., -0.5
  |   a   |   b   |   c   |    b =  0       , -0.5
  |       |       |       |    a = -0.666..., -0.5
  +-------+-------+-------+
-1, -1,

(Compare [the spec on rasterization], which specifies the pixels center as the relevant points:

Fragments are associated with pixel centers. That is, all the points with coordinates C, where fract(C) = vector2(0.5, 0.5) in the framebuffer space, enclosed into the polygon, are included.

So, assuming your depth texture covers the entire clip space then

   vec2f size = textureDimensions(depth_buffer, 0);
   vec2f clip_space = (vec2f(gid.xy) + 0.5) / size * 2.0 - 1.0;

whether or not you flip Y is up to you. There's no inherent mapping from invocation ids to anything.