pytorch backpropagation tensor autograd autodiff

Pytorch - why does preallocating memory cause "trying to backward through the graph a second time"

Suppose I have a simple one-hidden-layer network that I'm training in the typical way:

    for x,y in trainData:
        optimizer.zero_grad()
        out = self(x)
        loss = self.lossfn(out, y)
        loss.backward()
        optimizer.step()

This works as expected, but if I instead pre-allocate and update the output array, I get an error:

    out = torch.empty_like(trainData.tensors[1])
    for i,(x,y) in enumerate(trainData):
        optimizer.zero_grad()
        out[i] = self(x)
        loss = self.lossfn(out[i], y)
        loss.backward()
        optimizer.step()

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

What's happening here that in the second version Pytorch attempts to backward through the graph again? Why is this not an issue in the first version? (Note that this error occurs even if I don't zero_grad())

Solution

The error implies that the program is trying to backpropagate through a set of operations a second time. The first time you backpropagate through a set of operations, pytorch deletes the computational graph to free memory. Therefore, the second time you try to backpropagate it fails as the graph has already been deleted.

Here's a detailed explanation of the same.

Short answer

Use loss.backward(retain_graph=True). This will not delete the computational graph.

Detailed answer

In the first version, in each loop iteration, a new computational graph is generated every time out = self(x) is run.

Every loop's graph
out = self(x) -> loss = self.lossfn(out, y)

In the second version, since out is declared outside the loop, the computational graphs in every loop have a parent node outside.

           - out[i] = self(x) -> loss = self.lossfn(out[i], y) 
out[i] - | - out[i] = self(x) -> loss = self.lossfn(out[i], y) 
           - out[i] = self(x) -> loss = self.lossfn(out[i], y)

Therefore, here's a timeline of what happens.

The first iteration runs
The computation graph is deleted including the parent node
The second iteration attempts to backpropagate but failed since it didn't find the the parent node