c#opencv unity-game-engine emgucv opencv-solvepnp

Is there a pnp solver exposed in ARKit, or how to use opencv's SolvePnP function within a Unity3D project using the EmguCV c# wrapper

High-level context:

In a Unity3D AR project, using machine learning the system provides correspondence between a set of 2D pixel coordinates in an Image, and the same set of 3D points in world coordinate. From the correspondence between these two sets, I would like to estimate the camera pose that resulted into that image.

Low level goal to achieve:

The results of my research suggest I use the PnP-Ransac algorithm. Perspective-n-Points is the name of this problem i have: finding a camera pose from matching 2D-3D points. PnP problem definition: https://en.wikipedia.org/wiki/Perspective-n-Point

What I've tried

1) I tried to find a PnP solver in ARKit but I couldn't find it so I guess it is not exposed. 2) I tried using the EmguCV asset from the store which should allow me to use openCV within my Unity project. OpenCV solvePnP documentation: https://docs.opencv.org/3.3.0/d9/d0c/group__calib3d.html#ga50620f0e26e02caa2e9adc07b5fbf24e

The question:

Is there a PnP solver exposed in the ARKit framework, and if not, how do I use openCV's PnP solver correctly using the EmguCV C# wrapper within a Unity Project (coordinate systems to be aware of, correct function parameters to provide like camera intrinsic matrix, how to interpret the outputs to get the camera pose right)?

Problems I encountered trying to answer the question:

Using SolvePnPRansac led to the Unity-Editor it-self crashing even though I put it in a try-catch block (probably my input arguments had unexpected formats). I've had more success using just solvePnP, but the results are not what I expect. The documentation states that the output vectors rvec and tvec correspond to the translation and rotation bringing the object from model coordinate system to camera coordinate system. So if I put the camera to (0,0,0) looking into -z direction, having the object at tvec with euler rotations rvec, I'd expect the rendered object to be similar to the image I used for pixel-coordinate correspondence. Did I misunderstand that?

Suspicions I have: The coordinate system of openCV state that image coordinate y goes from top to bottom, while z and x remain forward-rightward. I tried inverting the 2D as well as the 3D coordinates in the y axis, but it didn't work

Edit [I removed my code here because i changed it a lot since i asked the question to get it work]

(some of many) Related Posts I looked throw the other 41 stackoverflow questions with tag opencv-solvePnP but none of them were Unity3D or c# related

solvePnP with Unity3D

got no answers

Camera pose estimation from homography or with solvePnP() function

How can I estimate the camera pose with 3d-to-2d-point-correspondences (using opencv)

difference: i need to do it in a unity3D c# project

obtaining 2d-3d point correspondences for pnp or posit

I got it, i need to use mathematical algorithms, that's the theory, but now how do I use the libraries at my disposal

Solution

Does ARKIT expose a PnP solver? i still don't know

about how to use openCV in a unity project: here is my own answer:

1) coordinate system: just make sure your 2D points have correct vertical coordinate direction: depending on how you get those 2D points, there may be nothing to change.

2) what intrinsic matrix to use:

var pixelsPerMeter = (((float)Screen.height / 2f)) / (Mathf.Tan(_mainCamera.fieldOfView * Mathf.Deg2Rad / 2f));
var fx = pixelsPerMeter;
var cx = Screen.width * 0.5f;
var fy = pixelsPerMeter;
var cy = Screen.height * 0.5f;
var cameraMatrix = new Emgu.CV.Matrix<float>(new float[3, 3] {
        {fx,0,cx},
        {0,fy,cy},
        {0,0,1}
        });

EDIT: the above only worked in the unity editor for me until now. deployment on a mobile device using ARKit still yields incorrect results for me until now!

3) How to interpret tvec and rvec from the result:

Use Rodrigues to correctly convert rvec into a rotation matrix, simply doing Quaternion.Euler(rvec[0], rvec[1], rvec[2]) will NOT work!

First rotate your object, then translate it by tvec, and finally apply your camera's localToWorld matrix additionally to place the object in relation to the camera (if you want the object to have a fixed position and the camera to be placed, just invert the matrix resulting from rvec and tvec)