Search code examples
deep-learningpytorchgoogle-colaboratory

How to load one model’s output as another model’s parameters and do end-to-end optimization


Here we have an abstract of this problem:

Assuming that we have two models: ResNet and EfficienNet, respectively.

The first model is as follow (ResNet):

def __init__(self, in_channels, out_channels, num_classes):

    super().__init__()
    self.conv1_0 = _conv3x3(3, 32, stride=2)
    self.bn1_0 = _bn(32)
    self.conv1_1 = _conv3x3(32, 32, stride=1)
    self.bn1_1 = _bn(32)
    self.conv1_2 = _conv3x3(32, 64, stride=1)
    self.relu = nn.ReLU()
    self.pad = torch.nn.ReplicationPad2d(padding=(0, 0, 1, 1))
    self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2)
    self.end_point = _fc(225, num_classes) 

def forward(self, x):
    x = self.conv1(x)
    #and so on...     
    out = self.end_point(x)
    return out

while the second model has been downloaded as follow (Efficient):

efficientNet = models.efficientnet_b5().to(device)

So, we have two models but the first is developed to scratch and the second one is imported by torchvision.models library.

Now, we want the EfficientNet to take the output of ResNet, then make an end-to-end optimization from EfficientNet’s output to first layer of ResNet.

I know an easier way, which consist of changing the codes and directly using ResNet’s result as an input of Efficient.forward(). However, Efficient model is too complex, making such a change is difficult.

For mathematical reasons, assuming that we have antoher easy model between the ResNet and EfficientNet, that it's called gumbel_model.

So, in conclusion we have 3 models but we got just one target labels about the last one of them, then we can only calculate the loss of the last model (Efficent Net).

When we calculate the loss of the last model, we actually write the three rows as follow:

optimizer.zero_grad()
loss.backforward()
optimizer.step()

Where the optimizer is as follow:

optimizer = optim.SGD([dict(params=efficientNet.parameters(), lr=LR)])

Is it correct that we are backpropagationing only the last model? To backpropagation end to end models, is it necessary to add parameters of ResNet like the first arg of optim.SGD? If Yes, how could we get params by ResNet? (see above)

I have tried using some codes during one epoch, as follow :

efficientNet = get_efficient_trained(device, out_features=1, in_features=3,path_model)
predictor = ResNet(ResidualBlockBase, layer_config, num_classes=num_cl)        

for i, imgs in enumerate(dataloader):
    inputs, labels = imgs
    inputs, labels = inputs.to(device), labels.to(device)  
    predictor_output = predictor(inputs)
    predictor_gumbel_output = gumbel(predictor_output)
    optimizer.zero_grad()
    outputs = efficientnet(predictor_gumbel_output).torch.squeeze(outputs, 1)     
    loss = loss_fn(outputs, labels)
    loss.backward()
    optimizer.step()    
return model

But from the results I have the apprehension that only EfficientNet is just trained.

Are there any ways to let the back-optimization reach to ResNet? How could I solve this type of problem?*

Waiting for responses. I really thank you all in advance.


Solution

  • The params argument of the optimizer specifies all the parameters you want to optimize. Here, as you are only passing the parameters of EfficientNet, only those get optimized, as you suspect.

    To optimize for all parameters end-to-end, simply pass them all when initializing the optimizer. This can be done as:

    optimizer = optim.SGD(list(efficientNet.parameters()) + list(gumbel.parameters()) + list(predictor.parameters()), lr=LR)