Search code examples
javagraphics3dbezier

How to convert a 3D point into 2D perspective projection?


I am currently working with using Bezier curves and surfaces to draw the famous Utah teapot. Using Bezier patches of 16 control points, I have been able to draw the teapot and display it using a 'world to camera' function which gives the ability to rotate the resulting teapot, and am currently using an orthographic projection.

The result is that I have a 'flat' teapot, which is expected as the purpose of an orthographic projection is to preserve parallel lines.

However, I would like to use a perspective projection to give the teapot depth. My question is, how does one take the 3D xyz vertex returned from the 'world to camera' function, and convert this into a 2D coordinate. I am wanting to use the projection plane at z=0, and allow the user to determine the focal length and image size using the arrow keys on the keyboard.

I am programming this in java and have all of the input event handler set up, and have also written a matrix class which handles basic matrix multiplication. I've been reading through wikipedia and other resources for a while, but I can't quite get a handle on how one performs this transformation.


Solution

  • The standard way to represent 2D/3D transformations nowadays is by using homogeneous coordinates. [x,y,w] for 2D, and [x,y,z,w] for 3D. Since you have three axes in 3D as well as translation, that information fits perfectly in a 4x4 transformation matrix. I will use column-major matrix notation in this explanation. All matrices are 4x4 unless noted otherwise.
    The stages from 3D points and to a rasterized point, line or polygon looks like this:

    1. Transform your 3D points with the inverse camera matrix, followed with whatever transformations they need. If you have surface normals, transform them as well but with w set to zero, as you don't want to translate normals. The matrix you transform normals with must be isotropic; scaling and shearing makes the normals malformed.
    2. Transform the point with a clip space matrix. This matrix scales x and y with the field-of-view and aspect ratio, scales z by the near and far clipping planes, and plugs the 'old' z into w. After the transformation, you should divide x, y and z by w. This is called the perspective divide.
    3. Now your vertices are in clip space, and you want to perform clipping so you don't render any pixels outside the viewport bounds. Sutherland-Hodgeman clipping is the most widespread clipping algorithm in use.
    4. Transform x and y with respect to w and the half-width and half-height. Your x and y coordinates are now in viewport coordinates. w is discarded, but 1/w and z is usually saved because 1/w is required to do perspective-correct interpolation across the polygon surface, and z is stored in the z-buffer and used for depth testing.

    This stage is the actual projection, because z isn't used as a component in the position any more.

    The algorithms:

    Calculation of field-of-view

    This calculates the field-of view. Whether tan takes radians or degrees is irrelevant, but angle must match. Notice that the result reaches infinity as angle nears 180 degrees. This is a singularity, as it is impossible to have a focal point that wide. If you want numerical stability, keep angle less or equal to 179 degrees.

    fov = 1.0 / tan(angle/2.0)
    

    Also notice that 1.0 / tan(45) = 1. Someone else here suggested to just divide by z. The result here is clear. You would get a 90 degree FOV and an aspect ratio of 1:1. Using homogeneous coordinates like this has several other advantages as well; we can for example perform clipping against the near and far planes without treating it as a special case.

    Calculation of the clip matrix

    This is the layout of the clip matrix. aspectRatio is Width/Height. So the FOV for the x component is scaled based on FOV for y. Far and near are coefficients which are the distances for the near and far clipping planes.

    [fov * aspectRatio][        0        ][        0              ][        0       ]
    [        0        ][       fov       ][        0              ][        0       ]
    [        0        ][        0        ][(far+near)/(far-near)  ][        1       ]
    [        0        ][        0        ][(2*near*far)/(near-far)][        0       ]
    

    Screen Projection

    After clipping, this is the final transformation to get our screen coordinates.

    new_x = (x * Width ) / (2.0 * w) + halfWidth;
    new_y = (y * Height) / (2.0 * w) + halfHeight;
    

    Trivial example implementation in C++

    #include <vector>
    #include <cmath>
    #include <stdexcept>
    #include <algorithm>
    
    struct Vector
    {
        Vector() : x(0),y(0),z(0),w(1){}
        Vector(float a, float b, float c) : x(a),y(b),z(c),w(1){}
    
        /* Assume proper operator overloads here, with vectors and scalars */
        float Length() const
        {
            return std::sqrt(x*x + y*y + z*z);
        }
        
        Vector Unit() const
        {
            const float epsilon = 1e-6;
            float mag = Length();
            if(mag < epsilon){
                std::out_of_range e("");
                throw e;
            }
            return *this / mag;
        }
    };
    
    inline float Dot(const Vector& v1, const Vector& v2)
    {
        return v1.x*v2.x + v1.y*v2.y + v1.z*v2.z;
    }
    
    class Matrix
    {
        public:
        Matrix() : data(16)
        {
            Identity();
        }
        void Identity()
        {
            std::fill(data.begin(), data.end(), float(0));
            data[0] = data[5] = data[10] = data[15] = 1.0f;
        }
        float& operator[](size_t index)
        {
            if(index >= 16){
                std::out_of_range e("");
                throw e;
            }
            return data[index];
        }
        Matrix operator*(const Matrix& m) const
        {
            Matrix dst;
            int col;
            for(int y=0; y<4; ++y){
                col = y*4;
                for(int x=0; x<4; ++x){
                    for(int i=0; i<4; ++i){
                        dst[x+col] += m[i+col]*data[x+i*4];
                    }
                }
            }
            return dst;
        }
        Matrix& operator*=(const Matrix& m)
        {
            *this = (*this) * m;
            return *this;
        }
    
        /* The interesting stuff */
        void SetupClipMatrix(float fov, float aspectRatio, float near, float far)
        {
            Identity();
            float f = 1.0f / std::tan(fov * 0.5f);
            data[0] = f*aspectRatio;
            data[5] = f;
            data[10] = (far+near) / (far-near);
            data[11] = 1.0f; /* this 'plugs' the old z into w */
            data[14] = (2.0f*near*far) / (near-far);
            data[15] = 0.0f;
        }
    
        std::vector<float> data;
    };
    
    inline Vector operator*(const Vector& v, const Matrix& m)
    {
        Vector dst;
        dst.x = v.x*m[0] + v.y*m[4] + v.z*m[8 ] + v.w*m[12];
        dst.y = v.x*m[1] + v.y*m[5] + v.z*m[9 ] + v.w*m[13];
        dst.z = v.x*m[2] + v.y*m[6] + v.z*m[10] + v.w*m[14];
        dst.w = v.x*m[3] + v.y*m[7] + v.z*m[11] + v.w*m[15];
        return dst;
    }
    
    typedef std::vector<Vector> VecArr;
    VecArr ProjectAndClip(int width, int height, float near, float far, const VecArr& vertex)
    {
        float halfWidth = (float)width * 0.5f;
        float halfHeight = (float)height * 0.5f;
        float aspect = (float)width / (float)height;
        Vector v;
        Matrix clipMatrix;
        VecArr dst;
        clipMatrix.SetupClipMatrix(60.0f * (M_PI / 180.0f), aspect, near, far);
        /*  Here, after the perspective divide, you perform Sutherland-Hodgeman clipping 
            by checking if the x, y and z components are inside the range of [-w, w].
            One checks each vector component seperately against each plane. Per-vertex
            data like colours, normals and texture coordinates need to be linearly
            interpolated for clipped edges to reflect the change. If the edge (v0,v1)
            is tested against the positive x plane, and v1 is outside, the interpolant
            becomes: (v1.x - w) / (v1.x - v0.x)
            I skip this stage all together to be brief.
        */
        for(VecArr::iterator i=vertex.begin(); i!=vertex.end(); ++i){
            v = (*i) * clipMatrix;
            v /= v.w; /* Don't get confused here. I assume the divide leaves v.w alone.*/
            dst.push_back(v);
        }
    
        /* TODO: Clipping here */
    
        for(VecArr::iterator i=dst.begin(); i!=dst.end(); ++i){
            i->x = (i->x * (float)width) / (2.0f * i->w) + halfWidth;
            i->y = (i->y * (float)height) / (2.0f * i->w) + halfHeight;
        }
        return dst;
    }
    

    If you still ponder about this, the OpenGL specification is a really nice reference for the maths involved. The DevMaster forums at http://www.devmaster.net/ have a lot of nice articles related to software rasterizers as well.