After about two weeks of reading any article, online tutorial and eBook that I could lay my eyes on (in relation to the concepts of 3D camera space), I've once again found myself trying to follow this article in order to create a proper 3D perspective camera. I am not entirely fluent with scientific notation for anything beyond high-school-level algebra, but (for the most part), I think I understand the large perspective matrix formula located at about halfway down the page. Just to be on the safe side though, I came here for a bit of clarity... And for the record, I realize that the following is in fact more than a single question, so I don't expect all the answers for all of them... Just a bit more direction before I go implementing yet another broken concept into my render engine.
And BEFORE anybody gets the wrong idea...
NO!
This is not for school (I get asked that a lot).
I am not even a student, just a long-time software engineering enthusiast/hobbyist.
So the only "grades" I'll get out of this are measured in degrees, and converted into radians...
(Yay! Math joke! I've sloped up to comedy!)
Okay so, first off... That formula on Wikipedia is for a left-handed coordinate system, or so they say. What would I need to do to make it work for a right-handed coordinate system (where negative-Z travels away from the user and into the screen)? Second, am I correct in assuming that each of the 3x3 matrices in the formula all correspond to rotation about the X, Y and Z-axes, respectively? If so, then do each of the theta symbols (θx, θy and θz) represent the camera's current yaw, pitch and roll (I ask because it mentions the use of Euler angles)? Also, I've seen some mixed articles and online tutorials which use 4x4 matrices instead of 3x3 matrices (or none at all!)... Does using a 4x4 matrix for perspective transforms prove any benefit?
And finally, my current goals require me to manually perform all matrix calculations (since I'm not using any external libraries such as OpenGL or DirectX). When performing transforms, I believe I'm supposed to multiply across the columns and sum across the rows for each individual matrix, correct? If that's the case, then how does one "strip out" the resulting X, Y and Z-coordinates from their respective transformed matrices (...or do I just have the totally wrong idea of how this formula is supposed to work)?
Thank you for your time!
First, A little explanation
In standard 3d APIs , there are three transformations called : WorldMatrix , ViewMatrix and projection matrix. first has nothing to do with camera, it is about transforming the world(local world of an object) to put it in right rotation and translation. Now about the second one. view matrix is very similar to world, but instead of rotation and transformation of world, they are responsible of rotation and transformation of camera, so their creation is pretty much about multiplying transformation matrices(Rotation, Transformation and Scale).
Last and most related to camera concept is the Projection Matrix:
Why 4x4?:
the reason this matrix is often presented in 4x4 is that these three matrices shape a pipeline that converts 3d coords into pixels , so since translation matrices must be defined in 4x4 manner, it is more practical to define every other matrix in pipeline, including Projection Matrix, the same.
Matter of Right or Left Hand:
This is what you would have for Left Hand:
2*zn/w 0 0 0
0 2*zn/h 0 0
0 0 zf/(zf-zn) 1
0 0 zn*zf/(zn-zf) 0
and make the comparison to right hand:
2*zn/w 0 0 0
0 2*zn/h 0 0
0 0 zf/(zn-zf) -1
0 0 zn*zf/(zn-zf) 0
as you can see the difference is all about 3rd row which is associated with Z axis.
Explanation of this Article
What I wrote above is what DirectX uses to create perspective projection matrix. But the link you provided was a wikipedia page with another form of the same matrix, But why my matrix does not have anything with trigonometry? The are 2 reasons:
w : Width of the view volume at the near view-plane.
h : Height of the view volume at the near view-plane.
zn : Z-value of the near view-plane.
zf : Z-value of the far view-plane.