Correct me if I am wrong.
The 'classic' way to pass images through torchvision transforms
is to
use Compose
as in its doc page. This, however, requires to pass Image
input.
An alternative is to use ConvertImageDtype
with torch.nn.Sequential
. This 'bypasses'
the need for Image
, and in my case it is much faster because I work with numpy arrays.
My problem is that results are not identical.
Below is an example with custom Normalize
.
I would like to use torch.nn.Sequential
(tr
) because it is faster for my needs,
but the error compared to Compose
(tr2
) is very large (~810).
from PIL import Image
import torchvision.transforms as T
import numpy as np
import torch
o = np.random.rand(64, 64, 3) * 255
o = np.array(o, dtype=np.uint8)
i = Image.fromarray(o)
tr = torch.nn.Sequential(
T.Resize(224, interpolation=T.InterpolationMode.BICUBIC),
T.CenterCrop(224),
T.ConvertImageDtype(torch.float),
T.Normalize([0.48145466, 0.4578275, 0.40821073], [0.26862954, 0.26130258, 0.27577711]),
)
tr2 = T.Compose([
T.Resize(224, interpolation=T.InterpolationMode.BICUBIC),
T.CenterCrop(224),
T.ToTensor(),
T.Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),
])
out = tr(torch.from_numpy(o).permute(2,0,1).contiguous())
out2 = tr2(i)
print(((out - out2) ** 2).sum())
The interpolation method seems to matter A LOT, and if I use the default BILINEAR
the error is ~7, but I need to use BICUBIC
.
The problem seems to lie in ConvertImageDtype
vs ToTensor
, because if I replace
ToTensor
with ConvertImageDtype
results are identical (cannot do the other way around
because ToTensor
is not a subclass of Module
and I cannot use it with nn.Sequential
).
However, the following gives identical results
tr = torch.nn.Sequential(
T.ConvertImageDtype(torch.float),
)
tr2 = T.Compose([
T.ToTensor(),
])
out = tr(torch.from_numpy(o).permute(2,0,1).contiguous())
out2 = tr2(i)
print(((out - out2) ** 2).sum())
This means that the interpolation changes something in the results, which matters only
when I use ToTensor
vs ConvertImageDtype
.
Any input is appreciated.
This is documented here:
The output image might be different depending on its type: when downsampling, the interpolation of PIL images and tensors is slightly different, because PIL applies antialiasing. This may lead to significant differences in the performance of a network. Therefore, it is preferable to train and serve a model with the same input types. See also below the
antialias
parameter, which can help making the output of PIL images and tensors closer.
Passing antialias=True
produces almost identical results.
This is interesting because the doc says that
it can be set to True for
InterpolationMode.BILINEAR
only mode.
Yet, I am using BICUBIC
and still works.