opencv 3d computer-vision geometry camera-calibration

Correct non-nadir view for GSD calculation (UAV)

Hello stackoverflow community,

So I am working on a project that requires calculating the ground sampling distance (GSD) in order to retrive the meter/pixel scale.

The GSD for nadir view (camera looking directly to the ground) formula is as follow :

GSD = (flight altitude x sensor height) / (focal length x image height and/or width).

and I read on multiple article like : https://www.mdpi.com/2072-4292/13/4/573

That if the camera has a tilt angle on one axis a correction as follow is requried :

where θ is the tilt angle and phi as they said in the article :

φ describes the angular position of the pixel in the image: it is zero in correspondence of the optical axis of the camera, while it can have positive or negative values for the other pixels

and the figure on their article is this :

So I hope you are on the same page as me, now I have two questions :

1- First how do I exactly calculate the angular position of a given pixel with respect to the optical axis (how to calculate the phi)

2- The camera in my case is rotated on two axis & not just one like their example, like the camera doesn't look exactly to the road but like oriented to one of the sides, more like this one :

So would there be more changes on the formula ? I am not sure how to get the right formula geometrically

Solution

The angular position of a pixel

As explained in the article you linked, you can compute the pixel angle by knowing the camera intrinsic parameters. Firstly let's do a bit of theory: the intrinsics matrix is used to compute the projection of a world point in the image plane of the camera. The OpenCV documentation explains it very well, it is expressed like this:

    ( x )   ( fx 0  cx )   ( X )
s * ( y ) = ( 0  fy cy ) * ( Y )
    ( 1 )   ( 0  0  1  )   ( Z )

where fx,fy is your focals, cx,cy is the optical centre, x,y is the position of the pixel in your image and X,Y,Z is your world point in meters or millimetres or whatever.

Now by inverting the matrix you can instead compute the world vector from the pixel position. World vector and not world point because the distance d between the camera and the real object is unknown.

    ( X )          ( x )
d * ( Y ) = A^-1 * ( y )
    ( Z )          ( 1 )

And then you can simply compute the angle between the optical axis and this world vector to get your phi angle, for example with the formula detailed in this answer using the y-axis of the camera as normal. In pseudo-code:

intrinsic_inv = invert(intrinsic)
world_vector = multiply(intrinsic_inv, (x, y, 1))
optical axis = (0, 0, 1)

normal = (0, 1, 0)
dot = dot_product(world_vector, optical_axis)
det = dot_product(normal, cross_product(world_vector, optical_axis))
phi = atan2(det, dot)

The camera angles

You can express the rotation of the camera by three angles: the tilt, the pan, and the roll angles. Take a look at this image I quickly googled if you want to visualize what they correspond to.

The tilt angle is the one named theta in your article, you already know it. The pan angle doesn't have an impact on the GSD, at least if we suppose that the ground is perfectly flat. If the pan angle was what you were referring to with the second rotation axis, then you'll have nothing to do.

However, if you have a non-zero roll angle this will become tricky. If you are in that case I would recommend a paradigm change to avoid dealing with angles. You can instead express the camera position using an affine transformation (rotation matrix and translation vector). This will allow you to transform the problem into a general analytical geometry problem, and then estimate the depths and scales by doing the intersection of the world vector with the ground plane. It would change the previous pseudo-code to give something like:

intrinsic_inv = invert(intrinsic)
world_vector = multiply(intrinsic_inv, (x, y, 1))
world_vector = multiply(rotation, world_vector) + translation
world_point = intersection(world_vector, ground_plane)

And then the scale can be computed by doing the differences between adjacent pixel world points.