python computer-vision linear-algebra scikit-image

Why does it seem necessary to rotate transformation matrix for mapping coordinates with scikit image?

I have a set of points that are effectively 3 vertices of a 45-45-90 right triangle and some other points, a, that should map to them.

import numpy as np

points = np.array([
    ( 90, 416),
    (398, 390),
    (374,  84)
])

a = np.array([
    (0, 1),  # maps to (90, 416)
    (1, 1),  # maps to (398, 390)
    (1, 0)   # maps to (374, 84)
])

I want to find the similarity transformation that properly maps a to points.

from skimage import transform

# transformation that makes sense to me
T1 = transform.estimate_transform(
    ttype="similarity",
    src=a,
    dst=points
)

# invert the rotation for no reason
# other than to show that it works
T2 = transform.SimilarityTransform(
    scale=T1.scale,
    rotation=-T1.rotation,
    translation=T1.translation
)

# apply transformations via matrix multiplication
a_T1 = a @ T1.params[:2, :2] + T1.params[:2, 2]
a_T2 = a @ T2.params[:2, :2] + T2.params[:2, 2]

Why is it that T2 (where I just inverted the rotation for no real reason other than I eventually found out that it works) yields the better mapping? Or am I making a dumb mistake in my implementation?

Solution

It's not you. There's lots of different conventions around transformation matrices and it just so happens that

skimage uses (M @ x.T).T as the convention
In this case, T2.params[:2, :2] is equal to T1.params[:2, :2].T (This is not generally true, if you have skew.)

But, I personally am always confused by the conventions here myself. To figure it out I had to know that skimage transforms have a __call__ method, so you can just apply them with T1(a), which gives the correct results. So now it was just a matter of looking at the source code and seeing that it calls _apply_mat, which looks like this:

    def _apply_mat(self, coords, matrix):
        ndim = matrix.shape[0] - 1
        coords = np.array(coords, copy=False, ndmin=2)

        src = np.concatenate([coords, np.ones((coords.shape[0], 1))], axis=1)
        dst = src @ matrix.T

        # below, we will divide by the last dimension of the homogeneous
        # coordinate matrix. In order to avoid division by zero,
        # we replace exact zeros in this column with a very small number.
        dst[dst[:, ndim] == 0, ndim] = np.finfo(float).eps
        # rescale to homogeneous coordinates
        dst[:, :ndim] /= dst[:, ndim : ndim + 1]

        return dst[:, :ndim]

For your transform (which is affine), this is essentially equivalent to (T1.params[:2, :2] @ a.T).T + T1.params[:2, 2], which is equal to a @ T1.params[:2, :2].T + T1.params[:2, 2], again, the transpose of what you were computing and equal to a @ T2.params[:2, :2] + T2.params[:2, 2].

Even though it's confusing to have multiple conventions out there, I hope this clarifies why things were looking wonky!