Search code examples
computer-visioncoordinate-transformationslam

monocular slam initial world coordinate system transformation


The monocular slam's initial coordinate system is random and scale-unknown.

But i want to know how to transform the initial coordinate system to another coordinate system(from a marker,like chessboard).

Is there has any papers or blogs.

Thanks a lot!


Solution

  • This is a difficult problem in the monocular setting, for which IMUs give pretty good results (e.g here). But it appears that you currenlty do not have any sensors except for the camera. In that case, estimating the scale using a chessboard or markers is not an ideal solution since it will require you to have a lot of control on the motion of your camera at initialization. For example, one simple way that comes to mind is this: keep the chessboard exactly vertical, and fix your camera at a distance N, with its axis orthogonal to the board. Now, make sure you move the the camera exactly parallel to the board for a time t. In this time interval, every feature that you detect on the board will be at distance N from the camera. This means that, if their depth in your SLAM coordinates is s, then your scale will be N/s. However, keeping the movement exactly parallel is awkward. I expect (feel free to correct me) that other marker based approaches will be equally bad.

    A better but considerably more time-consuming (from a development point of view) option is to use a model-based tracker (rich litterature, a very old example that comes to mind is pwp3D). Take a known object in your environment (a simple one for which you can easily obtain at CAD model, in which the scale is the true scale). Now, your problem is that of the alignment of your SLAM referential to the referential of object space. For that, you can use contour detection in the original image, project the CAD model and try to align them as best as possible (note that you have to align these contours in a sufficient number of images, using different view-points, in a bundle adjustment-like manner).

    Another possibility is to use a neural network to predict depth (rich literature on the subject), but the estimation will usually be less precise.

    Addressing alignment with ground truth coordinates:

    I understand from your comment that you want to align your SLAM coordinates to a previously known referential. I think that looking into SLAM systems that are georeferenced can be beneficial for you, since they face the same problem as you at initialization. Now, to come back to the problem at hand, here is how I'd do it:

    Feature matches between the two coordinate systems

    Let's note G your desired coordinates system, and let S be the SLAM referential. Your SLAM algorithm will reconstruct features from the marker, that we will call f'_1,f'_2,f'_3. Those will correspond to features f_1,f_2,f_3 on your marker. It is important that you can correctly match those features based on their appearance (texture or color, etc). Once you have those matches, the problem is to find the similarity sim=[sR sT;0 1] (using matlab/octave-like notations) where R is the rotation, T is the translation and s is the scale parameter. You should now be able to formulate your problem as some sort of

    argmin_{R,T,s} sum(d(f'_i, sim*f_i))

    where the sum runs for indexes 1,...,N (in the figure N=3), and d denotes a distance (I'd say, Euclidian) between features in referential G and their matches in referential S. Of course, this is a broad formulation, but I think it can serve as a general basis for your solution. Note, however, that it will be a lot better if you can have a prior on R,T,s before minimizing such a cost function, since minimization algorithms for those kinds of problems tend to get stuck in local minima.