I'm working on a nuclear segmentation problem where I'm trying to identify the positions of nuclei in images of stained tissues. The given training dataset has a picture of the stained tissue and a mask with the nuclei positions. Since the dataset was small, I wanted to try data augmentation in PyTorch, but after doing that, for some reason, when I output my mask image, it looks fine, but the corresponding tissue image is incorrect.
All my training images are in X_train
with shape (128, 128, 3)
, the corresponding masks in Y_train
with shape (128, 128, 1)
and similarly the cross-validation images and masks in X_val
and Y_val
respectively.
Y_train
and Y_val
have dtype = np.bool
, X_train
and X_val
have dtype = np.uint8
.
Before data augmentation, I check my images like this:
fig, axis = plt.subplots(2, 2)
axis[0][0].imshow(X_train[0].astype(np.uint8))
axis[0][1].imshow(np.squeeze(Y_train[0]).astype(np.uint8))
axis[1][0].imshow(X_val[0].astype(np.uint8))
axis[1][1].imshow(np.squeeze(Y_val[0]).astype(np.uint8))
The output is as follows: Before Data Augmentation
For the data augmentation, I define a custom class as follows:
Here I have imported torchvision.transforms.functional
as TF
and torchvision.transforms as transforms
. images_np
and masks_np
are the inputs which are numpy arrays.
class Nuc_Seg(Dataset):
def __init__(self, images_np, masks_np):
self.images_np = images_np
self.masks_np = masks_np
def transform(self, image_np, mask_np):
ToPILImage = transforms.ToPILImage()
image = ToPILImage(image_np)
mask = ToPILImage(mask_np.astype(np.int32))
angle = random.uniform(-10, 10)
width, height = image.size
max_dx = 0.2 * width
max_dy = 0.2 * height
translations = (np.round(random.uniform(-max_dx, max_dx)), np.round(random.uniform(-max_dy, max_dy)))
scale = random.uniform(0.8, 1.2)
shear = random.uniform(-0.5, 0.5)
image = TF.affine(image, angle = angle, translate = translations, scale = scale, shear = shear)
mask = TF.affine(mask, angle = angle, translate = translations, scale = scale, shear = shear)
image = TF.to_tensor(image)
mask = TF.to_tensor(mask)
return image, mask
def __len__(self):
return len(self.images_np)
def __getitem__(self, idx):
image_np = self.images_np[idx]
mask_np = self.masks_np[idx]
image, mask = self.transform(image_np, mask_np)
return image, mask
This is followed by:
I have used from torch.utils.data import DataLoader
train_dataset = Nuc_Seg(X_train, Y_train)
train_loader = DataLoader(train_dataset, batch_size = 16, shuffle = True)
val_dataset = Nuc_Seg(X_val, Y_val)
val_loader = DataLoader(val_dataset, batch_size = 16, shuffle = True)
After this step, I try to check on my first set training image and mask using this:
%matplotlib inline
for ex_img, ex_mask in train_loader:
img = ex_img[0]
img = img.reshape(128, 128, 3)
mask = ex_mask[0]
mask = mask.reshape(128, 128)
img = img.numpy()
mask = mask.numpy()
fig, (axis_1, axis_2) = plt.subplots(1, 2)
axis_1.imshow(img.astype(np.uint8))
axis_2.imshow(mask.astype(np.uint8))
break
I get this as my output: After Data Augmentation 1
When I change the line axis_1.imshow(img.astype(np.uint8))
to axis_1.imshow(img)
,
I get this image: After Data Augmentation 2
The images of the mask are correct, but for some reason, the images of nuclei are wrong. With the .astype(np.uint8)
, the tissue image is completely black.
Without the .astype(np.uint8)
, the positions of the nuclei are correct, but the color scheme is all messed up (I expect images like those seen before data augmentation, either gray-ish or pink-ish), plus 9 copies of the same image in a grid are displayed for some reason. Can you please help me get the correct output of the tissue images?
You are converting the images to PyTorch tensors, and in PyTorch the images have size [C, H, W]. When you are visualising them, you are converting the tensors back to NumPy arrays, where images have size [H, W, C]. Therefore you are trying to rearrange the dimensions, but you are using torch.reshape
, which doesn't swap the dimensions, but only partitions the data in a different way.
An example makes this clearer:
# Incrementing numbers with size 2 x 3 x 3
image = torch.arange(2 * 3 * 3).reshape(2, 3, 3)
# => tensor([[[ 0, 1, 2],
# [ 3, 4, 5],
# [ 6, 7, 8]],
#
# [[ 9, 10, 11],
# [12, 13, 14],
# [15, 16, 17]]])
# Reshape keeps the same order of elements but for a different size
# The numbers are still incrementing from left to right
image.reshape(3, 3, 2)
# => tensor([[[ 0, 1],
# [ 2, 3],
# [ 4, 5]],
#
# [[ 6, 7],
# [ 8, 9],
# [10, 11]],
#
# [[12, 13],
# [14, 15],
# [16, 17]]])
To reorder the dimensions you can use permute
:
# Dimensions are swapped
# Now the numbers increment from top to bottom
image.permute(1, 2, 0)
# => tensor([[[ 0, 9],
# [ 1, 10],
# [ 2, 11]],
#
# [[ 3, 12],
# [ 4, 13],
# [ 5, 14]],
#
# [[ 6, 15],
# [ 7, 16],
# [ 8, 17]]])
With the
.astype(np.uint8)
, the tissue image is completely black.
PyTorch images are represented as floats with values between [0, 1], but NumPy uses integer values between [0, 255]. Casting the float values to np.uint8
will result in only 0s and 1s, where everything that was not equal to 1, will be set to 0, therefore the whole image is black.
You need to multiply the values by 255 to bring them into range of [0, 255].
img = img.permute(1, 2, 0) * 255
img = img.numpy().astype(np.uint8)
This conversion is also automatically done when you are converting a tensor to a PIL image with transforms.ToPILImage
(or with TF.to_pil_image
if you prefer the functional version) and the PIL image can be converted directly to a NumPy array. With that you don't have to worry about the dimensions, value ranges or type and the code above can be replaced with:
img = np.array(TF.to_pil_image(img))