I am going to move the image for 1 or 2 pixels, as I specified a small number (1.25 , 1.9) in the affine matrix.
BUT, the image is moved far far away, like hundreds of pixels:
( my input image is fully filled with yellow pineapples)
Below is a working example.
import torch
import numpy as np
import matplotlib.pyplot as plt
from torchvision import datasets, transforms
import torch.nn.functional as F
rotation_simple = np.array([[1,0, 1.25],
[ 0,1, 1.9]])
#load image
transform = transforms.Compose([transforms.Resize(255),
transforms.CenterCrop(224),
transforms.ToTensor()])
dataloader = torch.utils.data.DataLoader(datasets.ImageFolder('/home/Pictures',transform=transform,), shuffle=True)
dtype = torch.FloatTensor
i = 0
while i<3:
img, labels = next(iter(dataloader))
img = img#.double() # 有时候要转为double有时候不用转
rotation_simple = torch.as_tensor(rotation_simple)[None]
grid = F.affine_grid(rotation_simple, img.size()).type(dtype)
x = F.grid_sample(img, grid)
plt.imshow(x[0].permute(1, 2, 0))
plt.show()
i+=1
I wonder why does the function move the the image so far away instead of moving it for just 1 pixel in x and y direction.
Ps. Setting "align_corners=True" didn't help for this case.
Pps. My pytorch version is 1.4.0+cu100
The "unit of measures" for the grid and the affine transformation are not pixels, but rather normalized coordinates:
grid
specifies the sampling pixel locations normalized by the input spatial dimensions. Therefore, it should have most values in the range of[-1, 1]
. For example, valuesx = -1, y = -1
is the left-top pixel of input, and valuesx = 1, y = 1
is the right-bottom pixel of input.
Therefore, translating by [1.25, 1.9]
is actually translating by almost the entire image size. You need to divide the translation values by 2*img.shape
to get pixel-wise translations.
See the doc for grid_sample
for more information.