I'm trying to debug a pretty complex interaction between different nnModules. It would be very helpful for me to be able to replace one of them with just an identity network for debugging purposes. For example:
net_a = NetworkA()
net_b = NetworkB()
net_c = NetworkC()
input = Autograd.Variable(torch.rand(10,2))
out = net_a(input)
out = net_b(out)
out = net_c(out)
I would like to be able to just change the second line to net_b = IdentityNet()
, instead of having to go through and reconnect all my As to Cs. But, when I make a completely empty nnModule, the optimizer throws ValueError: optimizer got an empty parameter list
.
Is there any workaround this?
A minimum non-working example:
import torch.optim as optim
class IdentityModule(nnModule):
def forward(self, inputs):
return inputs
identity = IdentityModule()
opt = optim.Adam(identity, lr=0.001)
out = identity(any_tensor)
error = torch.mean(out)
error.backward()
opt.step()
The problem you encounter here is a logical one. Look at what it means when you do:
error.backward()
opt.step()
.backward()
will recursively compute the gradients from your output to any input you pass into the network. In terms of the computation graph, there are two noteworthy kinds of inputs: the input you pass in, and the nn.Parameter
s that model the network's behavior. When you then do opt.step()
, PyTorch will look for any input that it can update to change the output of the network, namely the nn.Parameters()
.
However, your Pseudo-code does not have a single nn.Parameter
!, since the identity module does not contain one. So when you call these functions, the opt.step()
has no targets, explaining the error message.
This does not extend to the case you describe earlier. If you chain a module with no parameters into a larger chain with some that do have parameters, there are parameters to train in the computation graph.
You need to make sure, though, that the optimizer indeed gets all of these parameters passed upon initialization. A simple trick is to print these:
net_a = SomeNetwork()
net_b = IdentityNetwork() # has no parameters
net_c = SomeNetwork()
print(list(net_a.parameters())) # will contain whatever parameters in net_a
print(list(net_b.parameters())) # will be []
print(list(net_c.parameters())) # will contain whatever parameters in net_c
# to train all of them, you can do one of two things:
# 1. create new module. This works, since `.parameters()` collects params recursively from all submodules.
class NewNet(nn.Module):
def __init__(self):
nn.Module.__init__(self)
self.net_a = net_a
self.net_b = identity
self.net_c = net_c
def forward(self, input):
return self.net_c(self.net_b(self.net_a(input)))
all_parameters = list(NewNet().parameters())
print(all_parameters) # will contain a list of all parameters of net_a and net_c
# 2. simply merge the lists
all_parameters = list(net_a.parameters()) + list(net_b.parameters()) + list(net_c.parameters())
print(all_parameters) # will contain a list of all parameters of net_a and net_c
opt = optim.SGD(all_parameters)