2D to 3D Projection with given Z in world

I'm sorry if it has been ask before but I couldn't find the proper answer to my question.

For a better understanding, let me briefly explain the context of my problem

Context

I have two images (A and B) with non planar object on it. I would like to be able to take the coordinate of a pixel pA from A and project it into B. Since my scene is not planar, I can't use homography. What I want to do is first project my pixel pA into the 3D world and then project the result into the image B to get pB. pA (2D) -> pWorld (3D) -> pB (2D). Fortunately, I know the coordinate z of pworld. My question concerns the first step pA (2D) -> pWorld (3D).

Question

How to project my 2D point pA (u,v) into the world (pWorld=(X,Y,Z)), Z being given? I also have the extrinsic matrix Rt (3x4) and the intrinsic matrix K (3x3) of my camera.

What I tried

I know that : s*(u v 1)' = K * Rt * (X Y Z)' [1]

s is the scale. But I would like to have the opposite process, Z being given. Something like:

(X Y) = SOMETHING * (u v)

I can rewrite [1] to get s*(u v 1/s 1/s)' = G * (X Y Z 1)'

with G = (l1 l2 l3 l4) (l means line)

l1 = first line of (K*Rt)

l2 = second line of (K*Rt)

l3 = 0 0 1/Z 0

l4 = 0 0 0 1

G is invertible and I can then have (X Y Z 1)' = inv(G) * (us vs 1 1)'

But I can't use that since I don't know the scale. I think I'm a bit confused concerning this scale thing. I know usually we normalized to get rid of it but here, I can't.

Maybe that's not the good way to proceed. If someone can explain me the good way, I would be really glad to hear about it.

Thank you in advance.

Solution

I found a solution but it is damn ugly.

Let's consider the 3x4 matrix M:

M = K*Rt = (mij) 1<i<3, 1<j<4

For simplification, let's also consider the coefficients A and B:

A = (m12-m32*u)/(m22-m32v)
B = (m31*u-m11)/(m31*v-m21)

The notation explained, let's move on to the system. As I said, the system is:

s*(u v 1)' = M*(X Y Z 1)'

We have 3 equations and 3 unknowns : s, X and Y. We can notice that:

s = m31*X + m32*Y + m33*Z + m34

Note that if you want to project into the camera coordinates system and not in the world coordinates system (similar to a case where there is no rotation and translation), you have s = Z which is a way easier system to solve (example here To calculate world coordinates from screen coordinates with OpenCV)

With this in mind, we can reduce the original system into a system of 2 equations with 2 unknowns (X and Y):

Then, after some calculations, we finally get:

X = [Z*((m23-M33*v)*A-m13+m33*u) + (m24-m34*v)*A-m14+m34*u ] / [A*(m31*v-m21)-m31*u+m11]

Y = [Z*((m13-m33*u)-B*(m23-m33*v)) + m14-m34*u-B*(m24-m34*v)] / [B*(m22-m32*v)-m12+m32*u]

It is the expression of X and Y in function of u, v and Z. I tested that with my project and it was working.

Don't know if there is a cleaner way to compute that (with Matrix and stuff), but that's all I could come up with for now.