Search code examples
pythonpytorchdatasettorchvisionpytorch-dataloader

How do I load the CelebA dataset on Google Colab, using torch vision, without running out of memory?


I am following a tutorial on DCGAN. Whenever I try to load the CelebA dataset, torchvision uses up all my run-time's memory(12GB) and the runtime crashes. Am looking for ways on how I can load and apply transformations to the dataset without hogging my run-time's resources.

To Reproduce

Here is the part of the code that is causing issues.

# Root directory for the dataset
data_root = 'data/celeba'
# Spatial size of training images, images are resized to this size.
image_size = 64

celeba_data = datasets.CelebA(data_root,
                              download=True,
                              transform=transforms.Compose([
                                  transforms.Resize(image_size),
                                  transforms.CenterCrop(image_size),
                                  transforms.ToTensor(),
                                  transforms.Normalize(mean=[0.5, 0.5, 0.5],
                                                       std=[0.5, 0.5, 0.5])
                              ]))

The full notebook can be found here

Environment

  • PyTorch version: 1.7.1+cu101

  • Is debug build: False

  • CUDA used to build PyTorch: 10.1

  • ROCM used to build PyTorch: N/A

  • OS: Ubuntu 18.04.5 LTS (x86_64)

  • GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

  • Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)

  • CMake version: version 3.12.0

  • Python version: 3.6 (64-bit runtime)

  • Is CUDA available: True

  • CUDA runtime version: 10.1.243

  • GPU models and configuration: GPU 0: Tesla T4

  • Nvidia driver version: 418.67

  • cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

  • HIP runtime version: N/A

  • MIOpen runtime version: N/A

Versions of relevant libraries:

  • [pip3] numpy==1.19.4
  • [pip3] torch==1.7.1+cu101
  • [pip3] torchaudio==0.7.2
  • pip3] torchsummary==1.5.1
  • [pip3] torchtext==0.3.1
  • [pip3] torchvision==0.8.2+cu101
  • [conda] Could not collect

Additional Context

Some of the things I have tried are:

  • Downloading and loading the dataset on seperate lines. e.g:
# Download the dataset only
datasets.CelebA(data_root, download=True)
# Load the dataset here
celeba_data = datasets.CelebA(data_root, download=False, transforms=...)
  • Using the ImageFolder dataset class instead of the CelebA class. e.g:
# Download the dataset only
datasets.CelebA(data_root, download=True)
# Load the dataset using the ImageFolder class
celeba_data = datasets.ImageFolder(data_root, transforms=...)

The memory problem is still persistent in either of the cases.


Solution

  • I did not manage to find a solution to the memory problem. However, I came up with a workaround, custom dataset. Here is my implementation:

    import os
    import zipfile 
    import gdown
    import torch
    from natsort import natsorted
    from PIL import Image
    from torch.utils.data import Dataset
    from torchvision import transforms
    
    ## Setup
    # Number of gpus available
    ngpu = 1
    device = torch.device('cuda:0' if (
        torch.cuda.is_available() and ngpu > 0) else 'cpu')
    
    ## Fetch data from Google Drive 
    # Root directory for the dataset
    data_root = 'data/celeba'
    # Path to folder with the dataset
    dataset_folder = f'{data_root}/img_align_celeba'
    # URL for the CelebA dataset
    url = 'https://drive.google.com/uc?id=1cNIac61PSA_LqDFYFUeyaQYekYPc75NH'
    # Path to download the dataset to
    download_path = f'{data_root}/img_align_celeba.zip'
    
    # Create required directories 
    if not os.path.exists(data_root):
      os.makedirs(data_root)
      os.makedirs(dataset_folder)
    
    # Download the dataset from google drive
    gdown.download(url, download_path, quiet=False)
    
    # Unzip the downloaded file 
    with zipfile.ZipFile(download_path, 'r') as ziphandler:
      ziphandler.extractall(dataset_folder)
    
    ## Create a custom Dataset class
    class CelebADataset(Dataset):
      def __init__(self, root_dir, transform=None):
        """
        Args:
          root_dir (string): Directory with all the images
          transform (callable, optional): transform to be applied to each image sample
        """
        # Read names of images in the root directory
        image_names = os.listdir(root_dir)
    
        self.root_dir = root_dir
        self.transform = transform 
        self.image_names = natsorted(image_names)
    
      def __len__(self): 
        return len(self.image_names)
    
      def __getitem__(self, idx):
        # Get the path to the image 
        img_path = os.path.join(self.root_dir, self.image_names[idx])
        # Load image and convert it to RGB
        img = Image.open(img_path).convert('RGB')
        # Apply transformations to the image
        if self.transform:
          img = self.transform(img)
    
        return img
    
    ## Load the dataset 
    # Path to directory with all the images
    img_folder = f'{dataset_folder}/img_align_celeba'
    # Spatial size of training images, images are resized to this size.
    image_size = 64
    # Transformations to be applied to each individual image sample
    transform=transforms.Compose([
        transforms.Resize(image_size),
        transforms.CenterCrop(image_size),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.5, 0.5, 0.5],
                              std=[0.5, 0.5, 0.5])
    ])
    # Load the dataset from file and apply transformations
    celeba_dataset = CelebADataset(img_folder, transform)
    
    ## Create a dataloader 
    # Batch size during training
    batch_size = 128
    # Number of workers for the dataloader
    num_workers = 0 if device.type == 'cuda' else 2
    # Whether to put fetched data tensors to pinned memory
    pin_memory = True if device.type == 'cuda' else False
    
    celeba_dataloader = torch.utils.data.DataLoader(celeba_dataset,
                                                    batch_size=batch_size,
                                                    num_workers=num_workers,
                                                    pin_memory=pin_memory,
                                                    shuffle=True)
    

    This implementation is memory efficient and works for my use case, even during training the memory used averages around(4GB). I would however, appreciate further intuition as to what might be causing the memory problems.