I'm relatively new to PyTorch and am trying to reproduce an algorithm from an academic paper that approximates a term using the Hessian matrix. I've set up a toy problem so that I can compare the results of the full Hessian with the approximation. I found this gist and have been playing with it to compute the full Hessian part of the algorithm.
I am getting the error: "RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation."
I've scoured through the simple example code, documentation, and many, many forum posts about this issue and cannot find any in-place operations. Any help would be greatly appreciated!
Here is my code:
import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
torch.set_printoptions(precision=20, linewidth=180)
def jacobian(y, x, create_graph=False):
jac = []
flat_y = y.reshape(-1)
grad_y = torch.zeros_like(flat_y)
for i in range(len(flat_y)):
grad_y[i] = 1.
grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=create_graph)
jac.append(grad_x.reshape(x.shape))
grad_y[i] = 0.
return torch.stack(jac).reshape(y.shape + x.shape)
def hessian(y, x):
return jacobian(jacobian(y, x, create_graph=True), x)
def f(x):
return x * x
np.random.seed(435537698)
num_dims = 2
num_samples = 3
X = [np.random.uniform(size=num_dims) for i in range(num_samples)]
print('X: \n{}\n\n'.format(X))
mean = torch.Tensor(np.mean(X, axis=0))
mean.requires_grad = True
print('mean: \n{}\n\n'.format(mean))
cov = torch.Tensor(np.cov(X, rowvar=False))
print('cov: \n{}\n\n'.format(cov))
with autograd.detect_anomaly():
hessian_matrices = hessian(f(mean), mean)
print('hessian: \n{}\n\n'.format(hessian_matrices))
And here is the output with the stack trace:
X:
[array([0.81700949, 0.17141617]), array([0.53579366, 0.31141496]), array([0.49756485, 0.97495776])]
mean:
tensor([0.61678934097290039062, 0.48592963814735412598], requires_grad=True)
cov:
tensor([[ 0.03043144382536411285, -0.05357056483626365662],
[-0.05357056483626365662, 0.18426130712032318115]])
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-3-5a1c492d2873> in <module>()
42
43 with autograd.detect_anomaly():
---> 44 hessian_matrices = hessian(f(mean), mean)
45 print('hessian: \n{}\n\n'.format(hessian_matrices))
2 frames
<ipython-input-3-5a1c492d2873> in hessian(y, x)
21
22 def hessian(y, x):
---> 23 return jacobian(jacobian(y, x, create_graph=True), x)
24
25 def f(x):
<ipython-input-3-5a1c492d2873> in jacobian(y, x, create_graph)
15 for i in range(len(flat_y)):
16 grad_y[i] = 1.
---> 17 grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=create_graph)
18 jac.append(grad_x.reshape(x.shape))
19 grad_y[i] = 0.
/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused)
155 return Variable._execution_engine.run_backward(
156 outputs, grad_outputs, retain_graph, create_graph,
--> 157 inputs, allow_unused)
158
159
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2]] is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
I sincerely thought this was a bug in PyTorch, but after posting a bug, I got a good answer from albanD. https://github.com/pytorch/pytorch/issues/36903#issuecomment-616671247 He also pointed out that https://discuss.pytorch.org/ is available for asking questions.
The problem arises because we traverse the computation graph several times again and again. Exactly what is going on here is beyond me though...
The in place edit that your error message refers to are the obvious ones: grad_y[i] = 1.
and grad_y[i] = 0.
. Resuing grad_y
over and over again in the computation is what causes trouble. Redefining jacobian(...)
as below works for me.
def jacobian(y, x, create_graph=False):
jac = []
flat_y = y.reshape(-1)
for i in range(len(flat_y)):
grad_y = torch.zeros_like(flat_y)
grad_y[i] = 1.
grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=create_graph)
jac.append(grad_x.reshape(x.shape))
return torch.stack(jac).reshape(y.shape + x.shape)
An alternative, that works, but is more like black magic to me is leaving jacobian(...)
as it is, and instead redefine f(x)
to
def f(x):
return x * x * 1
That works too.