xna matrix-multiplication raytracing coordinate-systems perspectivecamera

Confused Regarding World View Projection Matrix question for Ray Tracing using XNA

So I've decided to rewrite an old ray tracer I had which was written in C++ and to do it in C#, leveraging the XNA framework.

I still have my old book and can follow the notes however I am confused regarding a few ideas and I was wondering whether someone could articulate it nicely.


    for each x pixel do
        for each y pixel do
        //Generate Ray
        //1 - Calculate world coordinates of current pixel
           //1.1 Calculate Normalized Device coordinates for current pixel 1- to -1 (u, v) 
           u = (2*x/ WIDTH) - 1 ;
           v = (2*y/HEIGHT) - 1 ;
           Vector3 rayDirection = -1*focalLength + u'*u + v'*v

In the above code u' and v' are the orthnormal basis calculated for the given camera (I know the same names make it confusing)

If I follow the book and do it the way it expresses, it works. However I am trying to leverage XNA and getting confused on how to perform the same actions but using Matrices.

So I've tried to replace the following steps with the XNA code


    class Camera
        {
           public Camera(float width, float height)
           {
            AspectRatio = width/height;
            FOV = Math.PI / 2.0f;
            NearPlane = 1.0f;
            FarPlane = 100.0f;
            ViewMatrix = Matrix.CreateLookAt(Position, Direction,this.Up);
            ProjectionMatrix=Matrix.CreatePerspectiveFieldOfView(FOV,
                                              AspectRatio,NearPlane,FarPlane); 
           }
        }

It's at this point I'm confused in the order of operations I am supposed to apply in order to get the direction vector for any pixel (x, y) ?

In my head I'm thinking: (u,v) = ProjectionMatrix * ViewMatrix * ModelToWorld * Vertex(in model space)

Therefore it would make sense that

Vertex (in world space) = Inverse(ViewMatrix) * Inverse(ProjectionMatrix) * [u, v, 0]

I also remembered something about how the view Matrix can be Transposed as well as Inverted since it is orthonormal.

Solution

The reason for NDC is to that you can map an image height/width in pixels to an arbitrary sized image (not necessarily 1:1) Essentially what I understood was the following:

You want to convert pixel X&Y to a uniform rectangle from -1 to 1 (essentially centering the camera within the viewing frame)
Perform the inverse projection matrix to which uses FOV, aspect ratio and near plane to place the pixel (in NDC coordinates) into world space coordinates
Perform the inverse of camera matrix to put the coordinate relative to the camera in world space
Calculate direction