Search code examples
pythonnumpyopencvimage-processingscipy

How to stretch an image along an arbitrary line or around an arbitrary point using Python?


I have a 128x128 image stored as a 2D numpy.ndarray (it's effectively a heatmap, so each entry is just a scalar value). I have identified:

  • one point on my image, P = (x0, y0)
  • a direction v = [v0, v1]
  • a line L, which passes through P and is perpendicular to v
  • a scale factor s (suppose for concreteness that s is a percentage)

I want to stretch my image away from the line L, i.e. along the direction of v, by s. What I mean by "stretch away from L" is that points on L should remain invariant under the transformation. The following diagram depicts the case where s is positive, so all points move away from L:

Diagram showing the stretch away from the line L

If s were negative, then I'd want to move all points closer to L.

If L passed through the origin, then this would just be a simple linear transformation, and I could use a normalized v times (1 + s) and some unit vector along L as my basis vectors. However, because L does not necessarily pass through the origin, I'm under the impression that this is some type of affine transformation.

Some preferred qualities for a solution:

  • I would prefer to just modify the contents of the image and keep it the same size, rather than resizing the image in any way
  • the "field of view" of the image should stay the same, i.e. L should stay in the same place in the image before and after the transformation

The first thing I tried was some type of image resizing, using cv2.resize. However, this only allows linear transformations where L passes through the origin, and it also requires me to resize the actual image itself.

The next thing I tried was using cv2.getAffineTransform along with cv2.warpAffine. Since cv2.getAffineTransform takes two sets of points (one before, one after), I decided to take two colinear points and calculate their midpoint (these are the 'before' points), then scale the original two points away from the midpoint to get the 'after' points. An image of the process:

Pictoral representation of what my code is doing

My code:

def get_transformation_matrix(pt1 : tuple[float, float], pt2 : tuple[float, float], scale_factor : float) -> np.ndarray:
    """Returns the transformation matrix to scale an image along the line containing `pt1` and `pt2`,
    centered at their midpoint, by `scale_factor` (a percentage increase/decrease)."""
    # the vector from pt1 to pt2, divided by 2
    # AKA the vector from pt1 to center
    # AKA the vector from center to pt2
    slope_vector = np.array([pt2[0] - pt1[0], pt2[1] - pt1[1]]) / 2

    # midpoint of the two given points
    center = np.array([(pt2[0] - pt1[0]) / 2 + pt1[0],
                       (pt2[1] - pt1[1]) / 2 + pt1[1]])
    
    # new points are just center ± resized slope_vector
    new_pt1 = center - slope_vector * (1 + scale_factor / 100)
    new_pt2 = center + slope_vector * (1 + scale_factor / 100)

    initial_points = np.float32([pt1, center, pt2])
    final_points = np.float32([new_pt1, center, new_pt2])
    
    return cv2.getAffineTransform(initial_points, final_points)

M = wf.get_transformation_matrix([100, 100], [200, 100], 10)
print(M)
dst = cv2.warpAffine(data, M, data.shape)

However, the M returned was just [[0. 0. 0.] [0. 0. 0.]], and dst was a completely black image. I'm not sure what went wrong, but every tutorial I found used a triangle for the initial and final points with cv2.getAffineTransform, so maybe using colinear points caused an issue?

The final thing I looked at is scipy.ndimage.affine_transform, which looks promising but requires me to know the transformation matrix and offset, and I'm not really sure how to find them.

Finally, I'm adding this subquestion in this post because it's related to the above: suppose that instead of stretching away from an invariant line L, I want to do a uniform dilation about the point P. I assume that this is going to be very similar to the case with the line, with the exception that the basis vector along L will no longer be a unit vector, but I figured I might as well confirm.

Apologies if this question has been asked before: I looked around and didn't find much, but it's possible that I just don't know what search terms to use.


Solution

  • Using the first image from your example:

    import cv2
    import numpy as np
    
    P = np.array([158, 298]) # w, h - coordinates from the image
    V = np.array([251, 155]) - P
    
    s = np.linalg.norm(V) # length of V
    

    Basically, what you need to do is to 1. translate the image so that P is at the origin (0, 0); 2. rotate the image so that V is parallel to one of the axes (I will use the Y axis); 3. scale the image along the Y axis (or along the X axis if you chose it at the previous step); 4. rotate the image back, reverting step 2; translate the image back, reverting step 1. All these steps can be expressed through affine transformation matrices and combined into a single transformation matrix. Let's look here, all we need is the picture in the "Affine transformations" sub-section.

    The first (translation) matrix would be:

    M_translate_forward = np.array([[1, 0, -P[0]],
                                    [0, 1, -P[1]],
                                    [0, 0, 1   ]])
    

    The second (rotation):

    cos_phi = abs(V[1]) / s
    sin_phi = abs(V[0]) / s
    M_rotate_forward = np.array([[cos_phi, -sin_phi, 0], 
                                 [sin_phi,  cos_phi, 0], 
                                 [      0,        0, 1]])
    

    The third (scaling along the Y axis; I hard-coded a smaller scale factor, because using the actual vector length from the picture produces too much scaling):

    scale_W = 1
    scale_H = 1.5 #s
    M_scale = np.array([[scale_W,       0, 0], 
                        [      0, scale_H, 0], 
                        [      0,       0, 1]])
    

    Now, we have to rotate and translate the result back to its original angle and position. Don't have to write the matrices manually, can just invert those we created in the previous steps:

    M_rotate_backward = np.linalg.inv(M_rotate_forward)
    
    M_translate_backward = np.linalg.inv(M_translate_forward)
    

    To combine them all together, we use matrix multiplication:

    M = M_translate_backward @ M_rotate_backward @ M_scale @ M_rotate_forward @ M_translate_forward
    

    We can also use the (AB)-1 = B-1A-1 property of inverse matrices and notice that if we invert the rightmost matrix product, we get the equivalent of the leftmost matrix product, therefore we can assemble the final matrix the following way instead, which would be somewhat more efficient because we save one inversion and one multiplication:

    M_prepare = M_rotate_forward @ M_translate_forward
    M_undo = np.linalg.inv(M_prepare)
    M = M_undo @ M_scale @ M_prepare
    

    Now, use warpAffine with the first two rows of it (from the documentation, it looks like the shape components should be re-ordered, too):

    img = cv2.imread("img.png")
    img2 = cv2.warpAffine(img, M[:2], (img.shape[1], img.shape[0]))
    cv2.imwrite("out.png", img2)
    

    The entire code example, only the process_image() function is needed, the rest is for testing: run it like python code.py image.jpg, then in the window with an image click-drag-release LMB - it will draw the vector and the line segment, and in the second window it will display the scaled image, and also it will make out_000.jpg~ out_003.jpg images with the scaling factor progressing from 1 to 2.5 so you can check if the scaling is actually happening around the specified line.

    import cv2
    import numpy as np
    import sys
    
    def get_x(P, V, y):
        return int(-V[1] * (y - P[1]) / V[0]) + P[0]
    
    def get_y(P, V, x):
        return int(-V[0] * (x - P[0]) / V[1]) + P[1]
    
    # the actual scaling function - the rest is for drawing and testing
    def process_image(img, P, V, S):
    
        # translate the pivot to the origin
        M_translate_forward = np.array([[1, 0, -P[0]],
                                        [0, 1, -P[1]],
                                        [0, 0, 1   ]], dtype = float)
    
        # rotate so that the translation direction is parallel to the Y axis
        Vn = V / np.linalg.norm(V)
        cos_phi = Vn[1]
        sin_phi = Vn[0]
        
        M_rotate_forward = np.array([[cos_phi, -sin_phi, 0], 
                                     [sin_phi,  cos_phi, 0], 
                                     [      0,        0, 1]])
    
        # preparatory transformation before the scaling
        M_prepare = M_rotate_forward @ M_translate_forward
    
        # scale in the Y axis direction
        scale_W = 1
        scale_H = S
        M_scale = np.array([[scale_W,       0, 0], 
                            [      0, scale_H, 0], 
                            [      0,       0, 1]])
    
        # undo the preparatory transformation after the scaling
        M_undo = np.linalg.inv(M_prepare)
    
        # all the transformations combined
        M = M_undo @ M_scale @ M_prepare
        
        return cv2.warpAffine(img, M[:2], img.shape[:2][::-1])
    
    v_origin = None
    v_direction = None
    
    img_tmp = None
    def on_mouse_click(event, x, y, flags, param):
        global v_origin, v_direction, img_src, img_tmp
        if event == cv2.EVENT_LBUTTONDOWN:
            img_tmp = img_src.copy()
            v_origin = (x, y)
            cv2.circle(img_tmp, v_origin, 2, (0, 0, 255), cv2.FILLED)
            cv2.imshow("Source", img_tmp)
        elif event == cv2.EVENT_LBUTTONUP:
            v_direction = (x, y)
            P = np.array(v_origin)
            V = np.array(v_direction) - P
            if all([a == b for a, b in zip(v_origin, v_direction)]): return
            
            cv2.arrowedLine(img_tmp, v_origin, v_direction, (0, 0, 255), 5)
            
            if abs(V[1]) > abs(V[0]):
                x0, x1 = P[0] - 100, P[0] + 100
                y0, y1 = (get_y(P, V, x) for x in (x0, x1))
            else:
                y0, y1 = P[1] - 100, P[1] + 100
                x0, x1 = (get_x(P, V, y) for y in (y0, y1))
            
            cv2.line(img_tmp, (x0, y0), (x1, y1), (255, 0, 0), 5)
            
            cv2.imshow("Source", img_tmp)
            
            cv2.imshow("Scaled", process_image(img_tmp, P, V, 1.5))
            
            cv2.imwrite("out_000.jpg", img_tmp)
            for scale, suffix in ((1.5, "001"), (2.0, "002"), (2.5, "003")):
                img2 = process_image(img_tmp, P, V, scale)
                fname = f"out_{suffix}.jpg"
                cv2.imwrite(fname, img2)
    
    img_src = cv2.imread(sys.argv[1])
    img_tmp = img_src.copy()
    
    cv2.namedWindow("Source")
    cv2.setMouseCallback("Source", on_mouse_click)
    cv2.imshow("Source", img_tmp)
    
    cv2.namedWindow("Scaled")
    
    while True:
        key = cv2.waitKey(1) & 0xFF
        if key == ord("q"): break
    
    cv2.destroyAllWindows() 
    

    enter image description here