Why am I getting "RuntimeError: Trying to backward through the graph a second time"?

My code:

import torch
import random


image_width, image_height = 128, 128

def apply_ellipse_mask(img, pos, axes):
    r = torch.arange(image_height)[:, None]
    c = torch.arange(image_width)[None, :]
    val_array = ((c - pos[0]) ** 2) / axes[0] ** 2 + ((r - pos[1]) ** 2) / axes[1] ** 2
    mask = torch.where((0.9 < val_array) & (val_array < 1), torch.tensor(1.0), torch.tensor(0.0))

    return img * (1.0 - mask) + mask


random.seed(0xced)

sphere_radius = image_height / 3
sphere_position = torch.tensor([image_width / 2, image_height / 2 ,0], requires_grad=True)

ref_image = apply_ellipse_mask(torch.zeros(image_width, image_height, requires_grad=True), sphere_position, [sphere_radius, sphere_radius, sphere_radius])

ellipsoid_pos = torch.tensor([sphere_position[0], sphere_position[1], 0], requires_grad=True)
ellipsoid_axes = torch.tensor([image_width / 3 + (random.random() - 0.5) * image_width / 5, image_height / 3 + (random.random() - 0.5) * image_height / 5, image_height / 2], requires_grad=True)

optimizer = torch.optim.Adam([ellipsoid_axes], lr=0.1)
criterion = torch.nn.MSELoss()
for _ in range(100):

    optimizer.zero_grad()
    current_image = torch.zeros(image_width, image_height, requires_grad=True)
    current_image = apply_ellipse_mask(current_image, ellipsoid_pos, ellipsoid_axes)

    loss = criterion(current_image, ref_image)
    loss.backward()
    print(_, loss)
    optimizer.step()

Error:

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Why would it be trying to backward through the same graph a second time? Am I directly accessing saved tensors after they were freed?

Solution

You have created a lot of leaf nodes (gradient-requiring variables), including:

ref_image = apply_ellipse_mask(torch.zeros(image_width, image_height, requires_grad=True), sphere_position, [sphere_radius, sphere_radius, sphere_radius])

which creates a leaf node (with torch.zeros(image_width, image_height, requires_grad=True)) and applied some computations so you get a computation graph. But then you reuse the result every iteration. You do not recompute it every iteration so you are trying to go backward the same graph several times. The only things that should have require_grad = True are parameters you optimize on.

You're having a differentiability problem

You're trying to flow gradient to ellipsoid_axes through computation of the mask, but the computation of the mask is not differentiable in axes (it returns 0-1 anyway). You should make the mask "soft" using some kind of sigmoid instead.

on your apply_ellipse_mask function

This is more of a comment as this code will still cause the same error. Avoid for-loops like this with PyTorch as they are slow. Instead you could write:

def apply_ellipse_mask(img, pos, axes):
    r = torch.arange(image_height)[:, None]
    c = torch.arange(image_width)[None, :]
    val_array = ((c - pos[0])**2) / axes[0]**2 + ((r - pos[1])**2) / axes[1]**2
    mask = torch.where(0.9 < val < 1, torch.tensor(1.0),  torch.tensor(0.0))

    return img * (1.0 - mask) + mask