opengl graphics computer-vision projection camera-calibration

Get camera matrix from OpenGL

I render a 3D mesh model using OpenGL with perspective camera – gluPerspective(fov, aspect, near, far).

Then I use rendered image in a computer vision algorithm.

At some point that algorithm requires camera matrix K (along with several vertices on the model and their corresponding projections) in order to estimate camera position: rotation matrix R and translation vector t. I can estimate R and t by using any algorithm which solves Perspective-n-Point problem.

I construct K from the OpenGL projection matrix (see how here)

K = [fX, 0, pX | 0, fY, pY | 0, 0, 1]

If I want to project a model point 'by hand' I can compute:

X_proj = K*(R*X_model + t) x_pixel = X_proj[1] / X_proj[3] y_pixel = X_proj[2] / X_proj[3]

Anyway, I pass this camera matrix in a PnP algorithm and it works just fine.

But then I had to change perspective projection to orthographic one. As far as I understand when using orthographic projection the camera matrix becomes:

K = [1, 0, 0 | 0, 1, 0 | 0, 0, 0]

So I changed gluPerspective to glOrtho. Following the same way I constructed K from OpenGL projection matrix, and it turned out that fX and fY are not ones but 0.0037371. Is this a scaled orthographic projection or what?

Moreover, in order to project model vertices 'by hand' I managed to do the following:

X_proj = K*(R*X_model + t) x_pixel = X_proj[1] + width / 2 y_pixel = X_proj[2] + height / 2

Which is not what I expected (that plus width and hight divided by 2 seems strange...). I tried to pass this camera matrix to POSIT algorithm to estimate R and t, and it doesn't converge. :(

So here are my questions:

How to get orthographic camera matrix from OpenGL?
If the way I did it is correct then is it true orthographic? Why POSIT doesn't work?

Solution

Orthographic projection will not use the depth to scale down farther points. Though, it will scale the points to fit inside the NDC which means it will scale the values to fit inside the range [-1,1]. This matrix from Wikipedia shows what this means:

So, it is correct to have numbers other than 1.

For your way of computing by hand, I believe it's not scaling back to screen coordinates and that makes it wrong. As I said, the output of projection matrices will be in the range [-1,1], and if you want to get the pixel coordinates, I believe you should do something similar to this:

X_proj = K*(R*X_model + t)
x_pixel = X_proj[1]*width/2 + width / 2
y_pixel = X_proj[2]*height/2 + height / 2

Anyway, I think you'd be better if you used modern OpenGL with libraries like GLM. In this case, you have the exact projection matrices used at hand.