Search code examples
pythonnumpycomputer-visionscipyaffinetransform

Image Registration and affine transformation in Python


I have been reading Programming Computer Vision with Python by Jan Erik Solem which is a pretty good book, however I haven't been able to clarify a question regarding image registration.

Basically, we have a bunch of images (faces) that need to be aligned a bit so the first thing needed is to perform a rigid transformation via a similarity transformation:

x' = | sR t | x
     | 0  1 |

where x is the vector (a set of coordinates in this case) to be transform into x' via a rotation R, a translation t and maybe a scaling s.

Solem calculates this rigid transformation for each image which returns the rotation matrix R and a translation vector as tx and ty:

R,tx,ty = compute_rigid_transform(refpoints, points)

However, he reorders the elements of R for some reason:

T = array([[R[1][1], R[1][0]], [R[0][1], R[0][0]]])

and later he performs an affine transformation:

im2[:,:,i] = ndimage.affine_transform(im[:,:,i],linalg.inv(T),offset=[-ty,-tx])

In this example, this affine transformation is performed on each channel but that's not relevant. im[:,:,i] is the image to be processed and this procedure returns another image.

What is T and why are we inverting that matrix in the affine transformation? And what are the usual steps to achieve image registration?

Update

Here you can find the relevant part of this code in Google Books. Starts at the bottom of page 67.


Solution

  • It looks like an error in the code to me. T appears to just be the transpose of R, which for a rotation matrix is the same as the inverse. Then he takes the inverse (again) in the call to ndimage.affine_transform. I think it should be either T or linalg.inv(R) passed to that function.