confusion on openGL's camera and camera space transformation

While specifying the model-view transformation we need three vectors that define the camera's local axes, the direction vector(where the camera points) , the up vector and the camera's right vector, in these 3 axes the direction vector is the camera's local Z axis.

The author at learnopengl.com mentions in the camera direction unit of this section:

For the view matrix's coordinate system we want its z-axis to be positive and because by default (in OpenGL) the camera points towards the negative z-axis we want to negate the direction vector.

My question is thus:

We formulate the view matrix and explicitly apply the transformation in the shader(passing the view matrix or lookAt matrix) as a uniform, so isn't the camera created by us? It's not like OpenGL provides us with a default camera object for us to work with, so how does this default camera even come from? We could assume the axes of our coordinate space pointing any direction right?

Solution

We formulate the view matrix and explicitly apply the transformation in the shader(passing the view matrix or lookAt matrix) as a uniform, so isn't the camera created by us?

Yes, exactly. And it was the same way even before shaders. There are just coordinate transformations, if we interpret that as a "camera" is completely up to us.

It's not like OpenGL provides us with a default camera object for us to work with, so how does this default camera even come from?

That tutorial is quite imprecise here.However, there were some default conventions in legacy OpenGL, but these were only conventions, and never strictly required to be used. Some of the legacy OpenGL functions were designed with those conventions in mind, though. The general idea was that the user uses a right-handed eye space, where x points to the right, y upwards and z outside of the screen towards the viewer, and therefore -z being the lookat direction. The old gluLookat() function follows these conventions.

And also the old functions for creating projection matrices adhere to this: glFrustum(), glOrtho(), gluPerspecitve() all take near and far as positive distances into viewing direction, but use z_eye = -near for the near plane, and z_eye = -far for the far plane, respectively. They also set up the lower row to be 0,0,-1,0, so we end up doing the perspetive divide by -z_eye and get a left-handed NDC coordinate system by that (where z_ndc points into the screen now).

Note that the matrix functions in glm are modeled to follow these conventions too, but you will also find functions suffixed with the name LH and RH so you can choose the convention you like.

We could assume the axes of our coordinate space pointing any direction right?

Yes. However, the clip space (which gl_Position vertex shader output is in) is defined by the GL: the rasterizer will always use the x derived from x_cip/w_clip and y derived from y_clip/w_clip , with x being horizontal, and y the vertical dimension. The z dimension is only used for depth testing, and if that is pointing into the screen our out of it is ultimately your choice again (you can switch the depth test comparison direction, or the glDepthRange, or both). The GPU will not care what you use inbetween at all, so for object space, world space and _eye_space_, you can use whatever conventions you like, or you can make up completely different spaces if that suits your needs better than the traditional model.