Search code examples
machine-learningdeep-learningcomputer-visionpytorchnormalize

Normalizing images passed to torch.transforms.Compose function


How to find the values to pass to the transforms.Normalize function in PyTorch? Also, where in my code, should I exactly do the transforms.Normalize?

Since normalizing the dataset is a pretty well-known task, I was hoping there should be some sort of script for doing that automatically. At least I couldn't find it in PyTorch forum.

transformed_dataset = MothLandmarksDataset(csv_file='moth_gt.csv',
                                           root_dir='.',
                                           transform=transforms.Compose([
                                           Rescale(256),
                                           RandomCrop(224),
                                           transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ],
                                           std = [ 0.229, 0.224, 0.225 ]),
                                           ToTensor()
                                               ]))
    
for i in range(len(transformed_dataset)):
    sample = transformed_dataset[i]
    print(i, sample['image'].size(), sample['landmarks'].size())
    if i == 3:
       break

I know these current values don't pertain to my dataset and pertain to ImageNet but using them I actually get an error:

    TypeError                                 Traceback (most recent call last)
    <ipython-input-81-eb8dc46e0284> in <module>
         10 
         11 for i in range(len(transformed_dataset)):
    ---> 12     sample = transformed_dataset[i]
         13 
         14     print(i, sample['image'].size(), sample['landmarks'].size())
    
    <ipython-input-48-9d04158922fb> in __getitem__(self, idx)
         30 
         31         if self.transform:
    ---> 32             sample = self.transform(sample)
         33 
         34         return sample
    
    ~/anaconda3/lib/python3.7/site-packages/torchvision/transforms/transforms.py in __call__(self, img)
         59     def __call__(self, img):
         60         for t in self.transforms:
    ---> 61             img = t(img)
         62         return img
         63 
    
    ~/anaconda3/lib/python3.7/site-packages/torchvision/transforms/transforms.py in __call__(self, tensor)
        210             Tensor: Normalized Tensor image.
        211         """
    --> 212         return F.normalize(tensor, self.mean, self.std, self.inplace)
        213 
        214     def __repr__(self):
    
    ~/anaconda3/lib/python3.7/site-packages/torchvision/transforms/functional.py in normalize(tensor, mean, std, inplace)
        278     """
        279     if not torch.is_tensor(tensor):
    --> 280         raise TypeError('tensor should be a torch tensor. Got {}.'.format(type(tensor)))
        281 
        282     if tensor.ndimension() != 3:
    
    TypeError: tensor should be a torch tensor. Got <class 'dict'>.

So basically three questions:

  1. How can I find the similar values as in ImageNet mean and std for my own custom dataset?
  2. How to pass these values and where? I assume I should do it in transforms.Compose method but I might be wrong.
  3. I assume I should apply Normalize to my entire dataset not just the training set, am I right?

Update:

Trying the provided solution here didn't work for me: https://discuss.pytorch.org/t/about-normalization-using-pre-trained-vgg16-networks/23560/6?u=mona_jalal

mean = 0.
std = 0.
nb_samples = 0.
for data in dataloader:
    print(type(data))
    batch_samples = data.size(0)
    
    data.shape(0)
    data = data.view(batch_samples, data.size(1), -1)
    mean += data.mean(2).sum(0)
    std += data.std(2).sum(0)
    nb_samples += batch_samples

mean /= nb_samples
std /= nb_samples

error is:

<class 'dict'>

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-51-e8ba3c8718bb> in <module>
      5 for data in dataloader:
      6     print(type(data))
----> 7     batch_samples = data.size(0)
      8 
      9     data.shape(0)

AttributeError: 'dict' object has no attribute 'size'

this is print(data) result:

{'image': tensor([[[[0.2961, 0.2941, 0.2941,  ..., 0.2460, 0.2456, 0.2431],
          [0.2953, 0.2977, 0.2980,  ..., 0.2442, 0.2431, 0.2431],
          [0.2941, 0.2941, 0.2980,  ..., 0.2471, 0.2471, 0.2448],
          ...,
          [0.3216, 0.3216, 0.3216,  ..., 0.2482, 0.2471, 0.2471],
          [0.3216, 0.3241, 0.3253,  ..., 0.2471, 0.2471, 0.2450],
          [0.3216, 0.3216, 0.3216,  ..., 0.2471, 0.2452, 0.2431]],

         [[0.2961, 0.2941, 0.2941,  ..., 0.2460, 0.2456, 0.2431],
          [0.2953, 0.2977, 0.2980,  ..., 0.2442, 0.2431, 0.2431],
          [0.2941, 0.2941, 0.2980,  ..., 0.2471, 0.2471, 0.2448],
          ...,
          [0.3216, 0.3216, 0.3216,  ..., 0.2482, 0.2471, 0.2471],
          [0.3216, 0.3241, 0.3253,  ..., 0.2471, 0.2471, 0.2450],
          [0.3216, 0.3216, 0.3216,  ..., 0.2471, 0.2452, 0.2431]],

         [[0.2961, 0.2941, 0.2941,  ..., 0.2460, 0.2456, 0.2431],
          [0.2953, 0.2977, 0.2980,  ..., 0.2442, 0.2431, 0.2431],
          [0.2941, 0.2941, 0.2980,  ..., 0.2471, 0.2471, 0.2448],
          ...,
          [0.3216, 0.3216, 0.3216,  ..., 0.2482, 0.2471, 0.2471],
          [0.3216, 0.3241, 0.3253,  ..., 0.2471, 0.2471, 0.2450],
          [0.3216, 0.3216, 0.3216,  ..., 0.2471, 0.2452, 0.2431]]],


        [[[0.3059, 0.3093, 0.3140,  ..., 0.3373, 0.3363, 0.3345],
          [0.3059, 0.3093, 0.3165,  ..., 0.3412, 0.3389, 0.3373],
          [0.3098, 0.3131, 0.3176,  ..., 0.3450, 0.3412, 0.3412],
          ...,
          [0.2931, 0.2966, 0.2931,  ..., 0.2549, 0.2539, 0.2510],
          [0.2902, 0.2902, 0.2902,  ..., 0.2510, 0.2510, 0.2502],
          [0.2864, 0.2900, 0.2863,  ..., 0.2510, 0.2510, 0.2510]],

         [[0.3059, 0.3093, 0.3140,  ..., 0.3373, 0.3363, 0.3345],
          [0.3059, 0.3093, 0.3165,  ..., 0.3412, 0.3389, 0.3373],
          [0.3098, 0.3131, 0.3176,  ..., 0.3450, 0.3412, 0.3412],
          ...,
          [0.2931, 0.2966, 0.2931,  ..., 0.2549, 0.2539, 0.2510],
          [0.2902, 0.2902, 0.2902,  ..., 0.2510, 0.2510, 0.2502],
          [0.2864, 0.2900, 0.2863,  ..., 0.2510, 0.2510, 0.2510]],

         [[0.3059, 0.3093, 0.3140,  ..., 0.3373, 0.3363, 0.3345],
          [0.3059, 0.3093, 0.3165,  ..., 0.3412, 0.3389, 0.3373],
          [0.3098, 0.3131, 0.3176,  ..., 0.3450, 0.3412, 0.3412],
          ...,
          [0.2931, 0.2966, 0.2931,  ..., 0.2549, 0.2539, 0.2510],
          [0.2902, 0.2902, 0.2902,  ..., 0.2510, 0.2510, 0.2502],
          [0.2864, 0.2900, 0.2863,  ..., 0.2510, 0.2510, 0.2510]]],


        [[[0.2979, 0.2980, 0.3015,  ..., 0.2825, 0.2784, 0.2784],
          [0.2980, 0.2980, 0.2980,  ..., 0.2830, 0.2764, 0.2795],
          [0.2980, 0.2980, 0.3012,  ..., 0.2827, 0.2814, 0.2797],
          ...,
          [0.3282, 0.3293, 0.3294,  ..., 0.2238, 0.2235, 0.2235],
          [0.3255, 0.3255, 0.3255,  ..., 0.2240, 0.2235, 0.2229],
          [0.3225, 0.3255, 0.3255,  ..., 0.2216, 0.2235, 0.2223]],

         [[0.2979, 0.2980, 0.3015,  ..., 0.2825, 0.2784, 0.2784],
          [0.2980, 0.2980, 0.2980,  ..., 0.2830, 0.2764, 0.2795],
          [0.2980, 0.2980, 0.3012,  ..., 0.2827, 0.2814, 0.2797],
          ...,
          [0.3282, 0.3293, 0.3294,  ..., 0.2238, 0.2235, 0.2235],
          [0.3255, 0.3255, 0.3255,  ..., 0.2240, 0.2235, 0.2229],
          [0.3225, 0.3255, 0.3255,  ..., 0.2216, 0.2235, 0.2223]],

         [[0.2979, 0.2980, 0.3015,  ..., 0.2825, 0.2784, 0.2784],
          [0.2980, 0.2980, 0.2980,  ..., 0.2830, 0.2764, 0.2795],
          [0.2980, 0.2980, 0.3012,  ..., 0.2827, 0.2814, 0.2797],
          ...,
          [0.3282, 0.3293, 0.3294,  ..., 0.2238, 0.2235, 0.2235],
          [0.3255, 0.3255, 0.3255,  ..., 0.2240, 0.2235, 0.2229],
          [0.3225, 0.3255, 0.3255,  ..., 0.2216, 0.2235, 0.2223]]]],
       dtype=torch.float64), 'landmarks': tensor([[[160.2964,  98.7339],
         [223.0788,  72.5067],
         [ 82.4163,  70.3733],
         [152.3213, 137.7867]],

        [[198.3194,  74.4341],
         [273.7188, 118.7733],
         [117.7113,  80.8000],
         [182.0750, 107.2533]],

        [[137.4789,  92.8523],
         [174.9463,  40.3467],
         [ 57.3013,  59.1200],
         [129.3375, 131.6533]]], dtype=torch.float64)}
dataloader = DataLoader(transformed_dataset, batch_size=3,
                        shuffle=True, num_workers=4)

and

transformed_dataset = MothLandmarksDataset(csv_file='moth_gt.csv',
                                           root_dir='.',
                                           transform=transforms.Compose(
                                               [
                                               Rescale(256),
                                               RandomCrop(224),
                                               
                                               ToTensor()#,
                                               ##transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ],
                                               ##         std = [ 0.229, 0.224, 0.225 ])
                                               ]
                                                                        )
                                           )

and

class MothLandmarksDataset(Dataset):
    """Face Landmarks dataset."""

    def __init__(self, csv_file, root_dir, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.landmarks_frame = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.landmarks_frame)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        img_name = os.path.join(self.root_dir, self.landmarks_frame.iloc[idx, 0])
        image = io.imread(img_name)
        landmarks = self.landmarks_frame.iloc[idx, 1:]
        landmarks = np.array([landmarks])
        landmarks = landmarks.astype('float').reshape(-1, 2)
        sample = {'image': image, 'landmarks': landmarks}

        if self.transform:
            sample = self.transform(sample)

        return sample

Solution

  • Source code errors

    How to pass these values and where? I assume I should do it in transforms.Compose method but I might be wrong.

    In MothLandmarksDataset it is no wonder it is not working as you are trying to pass Dict (sample) to torchvision.transforms which require either torch.Tensor or PIL.Image as input. here to be exact:

    sample = {'image': image, 'landmarks': landmarks}
    
    if self.transform:
        sample = self.transform(sample)
    

    You could pass sample["image"] into it although you shouldn't. Applying this operation only to sample["image"] would break its relation to landmarks. What you should be after is something like albumentations library (see here) which can transform image and landmarks in the same way to preserve their relations.

    Also there is no Rescale transform in torchvision, maybe you meant Resize?

    Mean and variance for normalization

    Provided code is fine, but you have to unpack your data into torch.Tensor like this:

    mean = 0.0
    std = 0.0
    nb_samples = 0.0
    for data in dataloader:
        images, landmarks = data["image"], data["landmarks"]
        batch_samples = images.size(0)
    
        images_data = images.view(batch_samples, images.size(1), -1)
        mean += images_data.mean(2).sum(0)
        std += images_data.std(2).sum(0)
        nb_samples += batch_samples
    
    mean /= nb_samples
    std /= nb_samples
    

    How to pass these values and where? I assume I should do it in transforms.Compose method but I might be wrong.

    Those values should be passed to torchvision.transforms.Normalize applied only to sample["images"], not to sample["landmarks"].

    I assume I should apply Normalize to my entire dataset not just the training set, am I right?

    You should calculate normalization values across training dataset and apply those calculated values to validation and test as well.