Search code examples
c++mathmatrixdirectxhlsl

Matrix Hell - Transforming a Point in a 3D Texture to World Space


Recently I have decided to add volumetric fog to my 3D game in DirectX. The technique I am using is from the book GPU Pro 6, but it is not necessary for you to own a copy of the book in order to help me :). Basically, the volumetric fog information is stored in a 3D texture. Now, I need to transform each texel of that 3D texture to world space. The texture is view-aligned, and by that I mean the X and the Y of that texture map to the X and Y of the screen, and the Z of that texture extends forwards in front of the camera. So essentially I need a function:

float3 CalculateWorldPosition(uint3 Texel)
{
    //Do math
}

I know the view matrix, and the dimensions of the 3D texture (190x90x64 or 190x90x128), the projection matrix for the screen, etc.

However that is not all, unfortunately.

The depth buffer in DirectX is not linear, as you may know. This same effect needs to be applied to my 3D texture - texels need to be skewed so there are more near the camera than far, since detail near the camera must be better than further away. However, I think I have got a function to do this, correct me if I'm wrong:

//Where depth = 0, the texel is closest to the camera.
//Where depth = 1, the texel is the furthest from the camera.
//This function returns a new Z value between 0 and 1, skewing it
// so more Z values are near the camera.
float GetExponentialDepth(float depth /*0 to 1*/)
{
    depth = 1.0f - depth;

    //Near and far planes
    float near = 1.0f;
    //g_WorldDepth is the depth of the 3D texture in world/view space
    float far = g_WorldDepth;

    float linearZ = -(near + depth * (far - near));
    float a = (2.0f * near * far) / (near - far);
    float b = (far + near) / (near - far);
    float result = (a / -linearZ) - b;
    return -result * 0.5f + 0.5f;
}

Here is my current function that tries to find the world position from the texel (note that it is wrong):

float3 CalculateWorldPos(uint3 texel)
{
    //Divide the texel by the dimensions, to get a value between 0 and 1 for
    // each of the components
    float3 pos = (float3)texel * float3(1.0f / 190.0f, 1.0f / 90.0f, 1.0f / (float)(g_Depth-1));
    pos.xy = 2.0f * pos.xy - float2(1.0f, 1.0f);
    //Skew the depth
    pos.z = GetExponentialDepth(pos.z);
    //Multiply this point, which should be in NDC coordinates,
    // by the inverse of (View * Proj)
    return mul(float4(pos, 1.0f), g_InverseViewProj).xyz;
}

However, projection matrices are also a little confusing to me, so here is the line that gets the projection matrix for the 3D texture, so one can correct me if it's incorrect:

//Note that the X and Y of the texture is 190 and 90 respectively.
//m_WorldDepth is the depth of the cuboid in world space.
XMMatrixPerspectiveFovLH(pCamera->GetFovY(), 190.0f / 90.0f, 1.0f, m_WorldDepth)

Also, I have read that projection matrices are not invertible (their inverse does not exist). If that is true, then maybe finding the inverse of (View * Proj) is incorrect, I'm not sure.

So, just to reiterate the question, given a 3D texture coordinate to a view-aligned cuboid, how can I find the world position of that point?

Thanks so much in advance, this problem has eaten up a lot of my time!


Solution

  • Let me first explain what the perspective projection matrix does.

    The perspective projection matrix transforms a vector from view space to clip space, such that the x/y coordinates correspond to the horizontal/vertical position on the screen and the z coordinate corresponds to the depth. A vertex that is positioned znear units away from the camera is mapped to depth 0. A vertex that is positioned zfar units away from the camera is mapped to depth 1. The depth values right behind znear increase very quickly, whereas the depth values right in front of zfar only change slowly.

    Specifically, given a z-coordinate, the resulting depth is:

    depth = zfar / (zfar - znear) * (z - znear) / z
    

    If you draw the frustum with lines after even spaces in depth (e.g. after every 0.1), you get cells. And the cells in the front are thinner than those in the back. If you draw enough cells, these cells map to your texels. In this configuration, it is exactly as you wish. There are more cells in the front (resulting in a higher resolution) than in the back. So you can just use the texel coordinate as the depth value (normalized to the [0,1] range). Here is the standard back projection for a given depth value into view space (assuming znear=1, zfar=10)

    Clip Space To View Space

    Your code doesn't work because of this line:

    return mul(float4(pos, 1.0f), g_InverseViewProj).xyz;
    

    There is a reason why we use 4D vectors and matrices. If you just throw the fourth dimension away, you get the wrong result. Instead, do the w-clip:

    float4 transformed = mul(float4(pos, 1.0f), g_InverseViewProj);
    return (transformed / transformed.w).xyz;
    

    Btw, the 4D perspective projection matrix is perfectly invertible. Only if you remove one dimension, you get a non-quadratic matrix, which is not invertible. But that's not what we usually do in computer graphics. However, these matrices are also called projections (but in a different context).