Search code examples
pytorchtransformmnistpytorch-lightning

Adding random positional variance to the MNIST dataset


I agm trying to train an autoencoder on the MNIST set, where the digits are supposed to have a random translation applied to them. Using the torch transforms, I can resize and translate, but this doens't have the desired effect (the digit gets translated out of frame). Does anyone here know of a transform or some other method that would allow me to get a smaller digit randomnly translated?

I have tried to do so manually using the following code:

image = dataset[0][0][0]
background = np.zeros((56,56))
topLeft = (random.randint(0,27), random.randint(0,27))
background[topLeft[0]:topLeft[0]+28, topLeft[1]:topLeft[1]+28] = image[0][0]

but I am unable to do this transformation on the actual MNIST set. Any help would be greatly appreciated.


Solution

  • i have done it with Affine transform

    from PIL import Image
    from pathlib import Path
    import matplotlib.pyplot as plt
    
    import torch
    from torchvision.transforms import v2
    
    plt.rcParams["savefig.bbox"] = 'tight'
    
    
    torch.manual_seed(0)
    
    # you can download the assets and the
    # helpers from https://github.com/pytorch/vision/tree/main/gallery/
    from helpers import plot
    orig_img = Image.open(Path('gallery/assets/astronaut.jpg'))
    
    affine_transfomer = v2.RandomAffine(degrees=0,translate=(0.1, 0.3),scale=(0.5,0.5))
    affine_imgs = [affine_transfomer(orig_img) for _ in range(4)]
    plot([orig_img] + affine_imgs)
    

    enter image description here

    On top of this you can also use 56x56 resize method
    here you can see more details, you can play with translate and scale params to shift the image from center

    I hope this helps