Search code examples
pythonmachine-learningdeep-learningpytorch

Pytorch DataLoader memory is not released


I'd like to implement SRGAN on pythorch on google collaboratory, but memory of DataLoader seems to be released, so if you turn epoch, memory error will occur. It would be greatly appreciated if you tell me how to do it in order to free up memory per batch. This is the GitHub link of the code

It turned 48 and a memory error occurred on 1 echoch. If you set the batch size to 1/6 of 8, you will get an error at about the 6th epoch. I am reading high resolution and low resolution images with the following code. Extend ImageFolder. But For example, even if an error occurs when learning is executed, the memory of the GPU is not released

class DownSizePairImageFolder(ImageFolder):
    def __init__(self, root, transform=None, large_size=256, small_size=64, **kwds):
        super().__init__(root, transform=transform, **kwds)
        self.large_resizer = transforms.Scale(large_size)
        self.small_resizer = transforms.Scale(small_size)
    
    def __getitem__(self, index):
        path, _ = self.imgs[index]
        img = self.loader(path)
        large_img = self.large_resizer(img)
        small_img = self.small_resizer(img)
        if self.transform is not None:
            large_img = self.transform(large_img)
            small_img = self.transform(small_img)
        return small_img, large_img


train_data = DownSizePairImageFolder('./lfw-deepfunneled/train',   transform=transforms.ToTensor())
test_data = DownSizePairImageFolder('./lfw-deepfunneled/test',    transform=transforms.ToTensor())
batch_size = 8
train_loader = DataLoader(train_data, batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size, shuffle=False)

Solution

  • Pytorch builds a computational graph each time you propagate through your model. This graph is normally retained until the output variable G_loss is out of scope, e.g. when a new iteration through the loop starts.

    However, you append this loss to a list. Hence, the variable is still known to python and the graph not freed. You can use .detach() to detach the variable from the current graph (which is better than .clone() which I proposed before as it will also copy the data of the tensor).

    As a little side node: In your train() function, you return D_loss,G_loss in the for loop, not after it; so you always only use the first batch.