Search code examples
pythonopenglkivyprojection

Project 3D point to 2D screen coordinate


I'm trying to project a few 3d points to screen coordinates to determine whether a touch occurs in roughly the same area. It should be noted that I'm doing this in Kivy, which is Python and OpenGL. I've seen questions like this, but I still don't have a solution. I've tried the following, but the numbers are not close to screen co-ordinates.

def to2D(self, pos, width, height, modelview, projection):
    p = modelview*[pos[0], pos[1], pos[2], 0])
    p = projection*p
    a = p[0]
    b = p[1]
    c = p[2]
    a /= c
    b /= c
    a = (a+1)*width/2.
    b = (b+1)*height/2.
    return (a, b)

To illustrate that this doesn't produce good results, take the following parameters

modelview = [[-0.831470, 0.553001, 0.053372, 0.000000],
             [0.000000, 0.096068, -0.995375, 0.000000],
             [-0.555570, -0.827624, -0.079878, 0.000000],
             [-0.000000, -0.772988, -2.898705, 1.000000]]
projection = [[ 15.763722, 0.000000, 0.000000, 0.000000],
              [ 0.000000, 15.257052, 0.000000, 0.000000],
              [ 0.000000, 0.000000, -1.002002, -2.002002],
              [ 0.000000, 0.000000, -1.000000, 0.000000]]
pos = [0.523355213060808, -0.528964010275341, -0.668054187020413] #I'm working on a unit sphere, so these are more meaningful in spherical coordinates
width = 800
height = 600

With these parameters, to2D gives screen coordinates of (1383, -274)

I don't think the problem is related to OpenGL and python, rather to the operations involved in getting from 3d to screen coordinates. What I'm trying to do: When a touch occurs, project a 3d point to 2d screen coordinates.

My idea: Take the camera's modelview and projection matrices, a point that I'm interested in, and the touch position, and then make a method to get from the point to the touch position. Get the method, by converting this source code for gluProject into Python

How I've done it:

  1. Take all of the mathematical objects into Sage for computational simplicity.

  2. My touch position is (150, 114.1)

  3. modelview = matrix([[ -0.862734, 0.503319, 0.048577, 0.000000 ], [ 0.000000, 0.096068, -0.995375, 0.000000 ], [ -0.505657, -0.858744, -0.082881, 0.000000 ], [ 0.000000, -0.772988, -2.898705, 1.000000 ]])

  4. projection = matrix([[ 15.763722, 0.000000, 0.000000, 0.000000 ], [ 0.000000, 15.257052, 0.000000, 0.000000 ], [ 0.000000, 0.000000, -1.002002, -2.002002 ], [ 0.000000, 0.000000, -1.000000, 0.000000 ]])

  5. width = 800.

  6. height = 600.

  7. v4 = vector(QQ, [0.52324, -0.65021, -0.55086, 1.])

  8. p = modelview*v4

  9. p = projection*p

  10. x = p[0] y = p[1] z = p[2] w = p[3]

  11. x /= w y /= w z /= w

  12. x = x*0.5 + 0.5 y = y*0.5 + 0.5 z = z*0.5 + 0.5

  13. x = x*width y = y*height #There's no term added because the widget is located at (0, 0)

The result:

x = 15362.18
y = -6251.43
z = 10.14

The revision: Since this is not even close, I went back to steps 8 and 9 and switched the order of multiplication to see what would happen. So now 8. is p = v4*modelview, and 9. is p = p*projection. In this case, the vectors are row vectors. Another way of looking at this would be p = modelviewTranspose*v4 and p = projectionTranspose*p, where the vectors are column vectors.

The result Part 2:

x = 150.29
y = 196.15
z = 0.6357

Recall that the goal is (150, 114.1). The x coordinate is very good, but the y coordinate is not. So I looked at y*z, which is 124.69. I could live with this answer, although I'm not sure if looking at y*z is what I should actually be doing


Solution

  • The first problem is here:

    p = modelview*[pos[0], pos[1], pos[2], 0])
    

    When you multiple vector with matrix as 4component vector, Last component (w) must be 1.0

    Another one is here:

    c = p[2]
    a /= c
    b /= c
    

    Instead of dividing x and y by z you should divide x, y AND z by w. w is p[4].

    In addition to that:

    When in doubt, find source code of gluProject and gluUnproject, tear it apart and convert to python.

    As far as I know, when projecting vector manually to screen, you're supposed to do following:

    1. Convert "position" to 4 component vector, with .w component set to one.

      v4.x = v3.x
      v4.y = v3.y
      v4.z = v3.z
      v4.w = 1.0
      
    2. Multiply 4component by matrices.

    3. Then divide all components by w.

      v4.x /= v4.w
      v4.y /= v4.w
      v4.z /= v4.w
      

    THEN you'll get screen coordinates within +-1.0 range for x and y. (z will be either within 0.0..1.0 or 0.0..-1.0, I forgot which in case of OpenGL).

    The reason why w comes to play is because you can't divide via matrix multiplication, so when you need to divide x/y/z by something, you put it into w component, and division is performed after all the matrix multiplications. w also makes translation matrices possible. Any vector with w == 0 cannot be translated using translation matrices, only rotated around origin and deformed with affine transforms ("origin" means point zero of coordinate space - (0.0, 0.0, 0.0) point)

    P.S. Also, I don't know how python handles integer to float conversions, but I'd replace a = (a+1)*width/2 with a = (a+1.0)*width/2.0 to explicitly specify you're using floating point numbers here.