Derivation of (glu)lookAt

I am trying to learn (modern) OpenGL and I am thoroughly confused about the various transformations...

The viewing matrix have me confused so I need some clarification.

Here's what I have understood about the (conventional) pipeline.

Vertices are specified in world space, which are scaled, translated, rotated etc. to the required positions using the modelling matrix
(Here's where I start to get confused) We can (optionally) position a virtual camera in the required location using a "lookAt" function (gluLookAt). I followed the derivation of the matrix here: http://www.youtube.com/watch?v=s9FhcvjM7Hk. I understood until the point, where the professor calculates the "look-at" vector. He says that the look-at vector = eye - center. Now here is where I begin to get lost. My first instinct is that the vector should be center - eye. Suppose the center vector is supplied as (0,0,0) and the eye vector is (0,0,5). To look at the object, the camera should point towards center - eye = (0,0,-5). However, the professor states that we want to move center - eye to the -z direction (what does that mean?). Therefore, eye - center will give the look at direction. I am confused about this. He further adds on that in OpenGL there is a camera at the origin looking at (0,0,-1). Now, this is I completely do not understand. I do understand that the viewing transformation is nothing but applying inverse transformation on the objects. I experimented a little bit and found that when I drew a triangle with a z-value of 1(and absolutely no modelview/projection transforms), it was still drawn on the screen. However, I wouldn't expect this to be so, since the camera is at the origin.

Now, to sum up...

Why is look at = eye - center?
What is this about the camera being at the origin and looking at z=-1?

Any explanations/pointers?

Solution

When you render a triangle, the vertices' coordinates are interpreted as follows:

The x-coordinate will influence the horizontal position on the viewport. -1 is the left edge and +1 is the right edge.
The y-coordinate will influence the vertical position on the viewport. -1 is the bottom edge and +1 is the top edge.
The z-coordinate will influence the depth information. -1 is the position at the camera's plane (near) and +1 is the far plane. This value is usually used to write to the depth-buffer.

That's why your simple example renders a visible triangle at the far plane.

Now let's come to the view transformation. The transformation will be constructed from four vectors. The image of (1, 0, 0), the image of (0, 1, 0), the image of (0, 0, 1) and a translation vector. However, since the view transformation is an inverse transformation, the resulting matrix has to be inverted.

You are right that the view direction is center - eye. However, that is not what we need for the matrix. We need the image of (0, 0, 1). Usually, OpenGL programs use a right-handed coordinate system. In that system the camera looks into negative z-direction. So center - eye is actually the image of (0, 0, -1). The image of (0, 0, 1) is then just eye - center. That's what you need.

With this definition you will also need an appropriate projection transformation. Otherwise you will only see things behind the camera (because that's where the z-coordinate is positive and, hence, have a positive depth value). The projection transformation is responsible for turning negative z-coordinates into positive depth values.