Search code examples
pythonimagepytorchimage-rotationaffinetransform

image translation in Pytorch, using affine_grid & grid_sample functions


I am going to move the image for 1 or 2 pixels, as I specified a small number (1.25 , 1.9) in the affine matrix.

BUT, the image is moved far far away, like hundreds of pixels:

enter image description here

( my input image is fully filled with yellow pineapples)

Below is a working example.

import torch
import numpy as np
import matplotlib.pyplot as plt
from torchvision import datasets, transforms
import torch.nn.functional as F

rotation_simple = np.array([[1,0, 1.25],
                           [ 0,1, 1.9]])

#load image
transform = transforms.Compose([transforms.Resize(255),
transforms.CenterCrop(224),
transforms.ToTensor()])
dataloader = torch.utils.data.DataLoader(datasets.ImageFolder('/home/Pictures',transform=transform,), shuffle=True)
dtype =  torch.FloatTensor


i = 0
while i<3:
    img, labels = next(iter(dataloader))
    img = img#.double() # 有时候要转为double有时候不用转

    rotation_simple = torch.as_tensor(rotation_simple)[None]

    grid = F.affine_grid(rotation_simple, img.size()).type(dtype)
    x = F.grid_sample(img, grid)

    plt.imshow(x[0].permute(1, 2, 0))
    plt.show()
    i+=1

I wonder why does the function move the the image so far away instead of moving it for just 1 pixel in x and y direction.

Ps. Setting "align_corners=True" didn't help for this case.

Pps. My pytorch version is 1.4.0+cu100


Solution

  • The "unit of measures" for the grid and the affine transformation are not pixels, but rather normalized coordinates:

    grid specifies the sampling pixel locations normalized by the input spatial dimensions. Therefore, it should have most values in the range of [-1, 1]. For example, values x = -1, y = -1 is the left-top pixel of input, and values x = 1, y = 1 is the right-bottom pixel of input.

    Therefore, translating by [1.25, 1.9] is actually translating by almost the entire image size. You need to divide the translation values by 2*img.shape to get pixel-wise translations.

    See the doc for grid_sample for more information.