PointCloud from two undistorted images

I want to do some Structure from Motion using OpenCV. This should happen on Android. Currently I am having the cameraMatrix (intrinsic parameters) and the distortion coefficients from the camera calibration.

The user should now take 2 images from building and the app should generate a pointcloud. Note: the user maybe also rotates the camera of the smartphone a little bit as he moves along one side of the building...

At the current point, I have the following information:

the undistorted left image
the undistorted right image
a list of good matches using SIFT
the homography matrix
the fundamental matrix

I've searched the internet and now I am very confused how I should proceed... Some say I need to use stereoRectify for getting Q and use Q with reprojectImageTo3D() for getting the pointCloud.

Others say that I need to use stereoRectifyUncalibrated and use H1 and H2 from this method to fill all the parameters of triangulatePoints. In triangulatePoints I need the projectionMatrix of each camera/image but from my understanding this seems definitly wrong.

So for me there are some problems:

How do I get R and T (Rotation and Translation) from all the information I already have
If I use stereoRectify, the first 4 parameters are cameraMatrix1, distortionCoeff1, cameraMatrix2, distortionCoeff2) - If I do not have a stereoCamera like Kinect, are the ameraMatrix1 and cameraMatrix2 equals for my setup (mono camera on a smartphone)
How can I obtain Q (guess if I have R and T I can get it from stereoRectify)
Is there anonther way of getting the projectioMatrices for each camera so I can use the triangulationmethod provided by OpenCV

I know this are a lot of questions, but googeling confused me so I need to get this straight. I hope someone can help me with my problems.

Thanks

PS as this are more theoretical questions I did not post some code. If you want / need to see code or the values of my camera calibration, just ask and I will add them to my posting.

Solution

I wrote something about using Farneback's optical flow for Structure from Motion before. You can read the details here.

But here's the code snippet, it's a somewhat working, but not great implementation. Hope that you can use it as a reference.

/* Try to find essential matrix from the points */
Mat fundamental = findFundamentalMat( left_points, right_points, FM_RANSAC, 0.2, 0.99 );
Mat essential   = cam_matrix.t() * fundamental * cam_matrix;

/* Find the projection matrix between those two images */
SVD svd( essential );
static const Mat W = (Mat_<double>(3, 3) <<
                     0, -1, 0,
                     1, 0, 0,
                     0, 0, 1);

static const Mat W_inv = W.inv();

Mat_<double> R1 = svd.u * W * svd.vt;
Mat_<double> T1 = svd.u.col( 2 );

Mat_<double> R2 = svd.u * W_inv * svd.vt;
Mat_<double> T2 = -svd.u.col( 2 );

static const Mat P1 = Mat::eye(3, 4, CV_64FC1 );
Mat P2 =( Mat_<double>(3, 4) <<
         R1(0, 0), R1(0, 1), R1(0, 2), T1(0),
         R1(1, 0), R1(1, 1), R1(1, 2), T1(1),
         R1(2, 0), R1(2, 1), R1(2, 2), T1(2));

/*  Triangulate the points to find the 3D homogenous points in the world space
    Note that each column of the 'out' matrix corresponds to the 3d homogenous point
 */
Mat out;
triangulatePoints( P1, P2, left_points, right_points, out );

/* Since it's homogenous (x, y, z, w) coord, divide by w to get (x, y, z, 1) */
vector<Mat> splitted = {
    out.row(0) / out.row(3),
    out.row(1) / out.row(3),
    out.row(2) / out.row(3)
};

merge( splitted, out );

return out;