Search code examples
luaneural-networktorch

Why do they clone the entire model before training in torch?


I have been going through lots of Torch code recently. I have been noticing, usually after the model is constructed, it is cloned, like in following code :

siamese_1=siamese_1:cuda()
parameters,gradParameters = siamese_1:getParameters()
siamese_2=siamese_1:clone('weight','bias','gradWeight','gradBias')
siamese_net:add(siamese_1)
siamese_net:add(siamese_2)

siamese_1 being a constructed model.

It is difficult to understand why is this being done ?

This code is for performing fine-tuning over networks. Is from the this repository (line 122 to 126).


Solution

  • When you clone a model and specify some additional arguments (like 'weight, etc) the new model will share these parameters with the original one. Thus in your case the models siamese_1 and siamese_2 share their weights, bias and the corresponding gradients.

    In the code you are looking at the authors want to create a network with two parallel networks sharing their weights, this is the reason why they use the clone function.