python numpy deep-learning pytorch torch

calculating the mean and std on an array of torch tensors

I am trying to calculate to mean and std for an array of torch tensors. My dataset has 720 training images and each of these images has 4 landmarks with X and Y representing a 2D point on the image.

to_tensor = transforms.ToTensor()

landmarks_arr = []

for i in range(len(train_dataset)):
    landmarks_arr.append(to_tensor(train_dataset[i]['landmarks']))
                     
mean = torch.mean(torch.stack(landmarks_arr, dim=0))#, dim=(0, 2, 3))
std = torch.std(torch.stack(landmarks_arr, dim=0)) #, dim=(0, 2, 3))



print(mean.shape)
print("mean is {} and std is {}".format(mean, std))

Result:

torch.Size([])
mean is nan and std is nan

There is a couple of problems above:

Why to_tensor is not converting the values between 0 and 1?
how to calculate mean correctly?
Should I divide by 255 and where?

I have:

len(landmarks_arr)
    
720

and

landmarks_arr[0].shape

torch.Size([1, 4, 2])

and

landmarks_arr[0]

tensor([[[502.2869, 240.4949],
         [688.0000, 293.0000],
         [346.0000, 317.0000],
         [560.8283, 322.6830]]], dtype=torch.float64)

Solution

From the pytorch docs of ToTensor():

Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has dtype = np.uint8

In the other cases, tensors are returned without scaling.

Since your Landmark values are not a PIL image, and not within [0, 255], no scaling is applied.

Your calculation appears correct. It seems, that you might have some NaN value within your data.

You can try something like

for i in range(len(train_dataset)):
    landmarks = to_tensor(train_dataset[i]['landmarks'])
    landmarks[landmarks != landmarks] = 0  # this will set all nan to zero
    landmarks_arr.append(landmarks)

within your loop. Or assert for nan within the loop to find the culprit(s):

for i in range(len(train_dataset)):
    landmarks = to_tensor(train_dataset[i]['landmarks'])
    assert(not torch.isnan(landmarks).any()), f'nan encountered in sample {i}'  # will trigger if a landmark contains nan
    landmarks_arr.append(landmarks)

No, see 1). You could divide by the max coordinates of the landmarks though to constrain them to [0, 1] if you so desire.