Search code examples
optimizationpytorchtensor

torch.optim returns "ValueError: can't optimize a non-leaf Tensor" for multidimensional tensor


I am trying to optimize the translations of the vertices of a scene with torch.optim.adam. It is a code piece from the redner tutorial series, which works fine with the initial setting. It tries to optimize a scene with shifting all the vertices by the same value called translation. Here is the original code:

vertices = []
for obj in base:
    vertices.append(obj.vertices.clone())

def model(translation):
    for obj, v in zip(base, vertices):
        obj.vertices = v + translation
    # Assemble the 3D scene.
    scene = pyredner.Scene(camera = camera, objects = objects)
    # Render the scene.
    img = pyredner.render_albedo(scene)
    return img

# Initial guess
# Set requires_grad=True since we want to optimize them later

translation = torch.tensor([10.0, -10.0, 10.0], device = pyredner.get_device(), requires_grad=True)

init = model(translation)
# Visualize the initial guess

t_optimizer = torch.optim.Adam([translation], lr=0.5)

I tried to modify the code such that it calculates an individual translation for each of the vertices. For this I applied the following modifications to the code above, that makes the shape of the translation from torch.Size([3]) to torch.Size([43380, 3]):

# translation = torch.tensor([10.0, -10.0, 10.0], device = pyredner.get_device(), requires_grad=True)
translation = base[0].vertices.clone().detach().requires_grad_(True)
translation[:] = 10.0

This introduces the ValueError: can't optimize a non-leaf Tensor. Could you please help me work around the problem.

PS: I am sorry for the long text, I am very new to this subject, and I wanted to state the problem as comprehensive as possible.


Solution

  • Only leaf tensors can be optimised. A leaf tensor is a tensor that was created at the beginning of a graph, i.e. there is no operation tracked in the graph to produce it. In other words, when you apply any operation to a tensor with requires_grad=True it keeps track of these operations to do the back propagation later. You cannot give one of these intermediate results to the optimiser.

    An example shows that more clearly:

    weight = torch.randn((2, 2), requires_grad=True)
    # => tensor([[ 1.5559,  0.4560],
    #            [-1.4852, -0.8837]], requires_grad=True)
    
    weight.is_leaf # => True
    
    result = weight * 2
    # => tensor([[ 3.1118,  0.9121],
    #            [-2.9705, -1.7675]], grad_fn=<MulBackward0>)
    # grad_fn defines how to do the back propagation (kept track of the multiplication)
    
    result.is_leaf # => False
    

    The result in this example, cannot be optimised, since it's not a leaf tensor. Similarly, in your case translation is not a leaf tensor because of the operation you perform after it was created:

    translation[:] = 10.0
    translation.is_leaf # => False
    

    This has grad_fn=<CopySlices> therefore it's not a leaf and you cannot pass it to the optimiser. To avoid that, you would have to create a new tensor from it that is detached from the graph.

    # Not setting requires_grad, so that the next operation is not tracked
    translation = base[0].vertices.clone().detach()
    translation[:] = 10.0
    # Now setting requires_grad so it is tracked in the graph and can be optimised
    translation = translation.requires_grad_(True)
    

    What you're really doing here, is create a new tensor filled with the value 10.0 with the same size as the vertices tensor. This can be achieved much easier with torch.full_like

    translation = torch.full_like(base[0],vertices, 10.0, requires_grad=True)