I'm writing a ray tracer for which I need to cast rays from the screen into the world. I'm using the inverse of the view-projection-viewport matrix to get back from screen pixel coordinates to world space coordinates.
I noticed by accident that two elements of the inverse matrix are always 0, no matter how or where I move, zoom, orbit, or spin the camera. I don't understand the projection matrix deeply enough to know why.
This is a hopefully-relevant portion of my Matrix class:
// OpenGL-style column-vectors layout.
public final class Matrix4 {
public double e11, e12, e13, e14;
public double e21, e22, e23, e24;
public double e31, e32, e33, e34;
public double e41, e42, e43, e44;
public void transform(Vec3 v) {
double x = v.x, y = v.y, z = v.z;
double w = (x * e41) + (y * e42) + (z * e43) + (e44);
double norm = 1 / w;
v.x = ((x * e11) + (y * e12) + (z * e13) + (e14)) * norm;
v.y = ((x * e21) + (y * e22) + (z * e23) + (e24)) * norm;
v.z = ((x * e31) + (y * e32) + (z * e33) + (e34)) * norm;
}
public void perspectiveFovLH(double fovy, double aspect, double znear, double zfar) {
double yscale = 1 / tan(fovy * 0.5);
double xscale = yscale / aspect;
double Q = zfar / (zfar - znear);
e11 = xscale;
e22 = yscale;
e33 = Q;
e34 = Q * -znear;
e43 = 1;
}
}
This is the view-projection-viewport matrix after moving the camera around for a while:
[ -331.62051997, -616.31014741, -927.45531750, 354532.72229686
158.60434983, -1091.70516425, 427.49500301, 420008.37868979
0.75468455, -0.51045982, -0.41337968, 766.88486375
0.75430721, -0.51020459, -0.41317299, 771.50142132 ]
And this is its inverse:
[ -0.00058203, -0.00019009, 113.94949731, -112.89669211
-0.00033747, -0.00072765, -84.29481494, 84.34162134
-0.00064586, 0.00055151, -61.14292715, 60.77360931
-0.00000000, -0.00000000, -0.19990000, 0.20000000 ]
Most of the elements change continuously as I move the camera. The two elements in the bottom right of the inverse seem to be directly related to the znear and zfar parameters. The two elements in the bottom left always seem to be constant 0.
This fact is useful in the transform method. If e41 and e42 are 0, and input z is always 0 or 1 when casting the near and far points of a ray into the screen, the w division can be computed ahead of time, rather than per-pixel. This is working and is a useful speedup.
However, I'm worried this assumption will break something later. So I'd like to know, what does it mean that these two elements are 0, and when will this assumption hold?
Edit: I found out that an affine 4x4 matrix has the last four elements (0 0 0 1). My inverse matrix almost matches up. But what is the magic that makes the inverse of a perspective matrix be affine, or, almost affine?
For the given perspectiveFovLH
matrix P and any homogeneous/affine transformation matrix M, the inverse R≔(PM)−1 does always have Rwx=Rwy=0.
To see this, let t be Mv for some input v, and let s be Pt=PMv, such that v=Rs. Note that every M has Mwx=Mwy=Mwz=0 and Mww=1, such that tw=vw. Then observe from the sparseness of P that sw=tz and that
sz=Qtz − Qz0tw
=Qsw − Qz0vw
Solving the last equation yields
vw=(Qsw − sz)/(Qz0)
This formula manifestly depends only on sz and sw, demonstrating as supposed that Rwx=Rwy=0.
We can understand this in a non-formulaic fashion: 3×4 matrices are sufficient for perspective on a screen, because you need only 2 output coordinates and the one homogeneous coordinate by which to divide them. Using the full 4×4 matrix introduces a redundancy because both the z and w coordinates of the result derive only from the z coordinate of the input (and the known constant 1 for the initial w). Regardless of the world transformation applied before the projection, you still have those two equations in the transformed z and the original w, which is enough to solve for them.