Search code examples
mathmatrix3dprojectionmatrix-inverse

How to calculate ray in real-world coordinate system from image using projection matrix?


Given n images and a projection matrix for each image, how can i calculate the ray (line) emitted by each pixel of the images, which is intersecting one of the three planes of the real-world coordinate system? The object captured by the camera is at the same position, just the camera's position is different for each image. That's why there is a separate projection matrix for each image.

As far as my research suggests, this is the inverse of the 3D to 2D projection. Since information is lost when projecting to 2D, it's only possible to calculate the ray (line) in the real-world coordinate system, which is fine.

An example projection matrix P, that a calculated based on given K, R and t component, according to K*[R t]

   3310.400000 0.000000 316.730000 
K= 0.000000 3325.500000 200.550000 
   0.000000 0.000000 1.000000 


   -0.14396457836077139000 0.96965263281337499000 0.19760617153779569000 
R= -0.90366580603479685000 -0.04743335255026152200 -0.42560419233334673000 
   -0.40331536459778505000 -0.23984130575212276000 0.88306936201487163000 

   -0.010415508744 
t= -0.0294278883669 
   0.673097816109

   -604.322  3133.973   933.850   178.711
P= -3086.026  -205.840 -1238.247    37.127
   -0.403    -0.240     0.883     0.673

I am using the "DinoSparseRing" data set available at http://vision.middlebury.edu/mview/data

for (int i = 0; i < 16; i++) {
        RealMatrix rotationMatrix = MatrixUtils.createRealMatrix(rotationMatrices[i]);
        RealVector translationVector = MatrixUtils.createRealVector(translationMatrices[i]);
        // construct projection matrix according to K*[R t]
        RealMatrix projMatrix = getP(kalibrationMatrices[i], rotationMatrices[i], translationMatrices[i]);
        // getM returns the first 3x3 block of the 3x4 projection matrix
        RealMatrix projMInverse = MatrixUtils.inverse(getM(projMatrix));
        // compute camera center
        RealVector c = rotationMatrix.transpose().scalarMultiply(-1.f).operate(translationVector);

        // compute all unprojected points and direction vector per project point
        for (int m = 0; m < image_m_num_pixel; m++) {
            for (int n = 0; n < image_n_num_pixel; n++) {
                double[] projectedPoint = new double[]{
                        n,
                        m,
                        1};
                // undo perspective divide
                projectedPoint[0] *= projectedPoint[2];
                projectedPoint[1] *= projectedPoint[2];
                // undo projection by multiplying by inverse:
                RealVector projectedPointVector = MatrixUtils.createRealVector(projectedPoint);
                RealVector unprojectedPointVector = projMInverse.operate(projectedPointVector);

                // compute direction vector
                RealVector directionVector = unprojectedPointVector.subtract(c);
                // normalize direction vector
                double dist = Math.sqrt((directionVector.getEntry(0) * directionVector.getEntry(0))
                        + (directionVector.getEntry(1) * directionVector.getEntry(1))
                        + (directionVector.getEntry(2) * directionVector.getEntry(2)));
                directionVector.setEntry(0, directionVector.getEntry(0) * (1.0 / dist));
                directionVector.setEntry(1, directionVector.getEntry(1) * (1.0 / dist));
                directionVector.setEntry(2, directionVector.getEntry(2) * (1.0 / dist));
            }
        }
    }

The following 2 plots show the outer rays for each images (total of 16 images). The blue end is the camera point and the cyan is a bounding box containing the object captured by the camera. One can clearly see the rays projecting back to the object in world coordinate system.

enter image description here enter image description here


Solution

  • To define the ray you need a start point (which is the camera/eye position) and a direction vector, which can be calculated using any point on the ray.

    For a given pixel in the image, you have a projected X and Y (zeroed at the center of the image) but no Z depth value. However the real-world co-ordinates corresponding to all possible depth values for that pixel will all lie on the ray you are trying to calculate, so you can just choose any arbitrary non-zero Z value, since any point on the ray will do.

    float projectedX = (x - imageCenterX) / (imageWidth * 0.5f);
    float projectedY = (y - imageCenterY) / (imageHeight * 0.5f);
    float projectedZ = 1.0f; // any arbitrary value
    

    Now that you have a 3D projected co-ordinate you can undo the projection by applying the perspective divide in reverse by multiplying X and Y by Z, then multiplying the result by the inverse projection matrix to get the unprojected point.

    // undo perspective divide (redundant if projectedZ = 1, shown for completeness)
    projectedX *= projectedZ;
    projectedY *= projectedZ;
    Vector3 projectedPoint = new Vector3(projectedX, projectedY, projectedZ);
    
    // undo projection by multiplying by inverse:
    Matrix invProjectionMat = projectionMat.inverse();
    Vector3 unprojectedPoint = invProjectionMat.multiply(projectedPoint);
    

    Subtract the camera position from the unprojected point to get the direction vector from the camera to the point, and then normalize it. (This step assumes that the projection matrix defines both the camera position and orientation, if the position is stored separately then you don't need to do the subtraction)

    Vector3 directionVector = unprojectedPoint.subtract(cameraPosition);
    directionVector.normalize();
    

    The ray is defined by the camera position and the normalized direction vector. You can then intersect it with any of the X, Y, Z planes.