By following this web-site
http://caffe.berkeleyvision.org/gathered/examples/siamese.html
, I can use a Siamese network in Caffe, which shares the weights for each layer.
But, I was wondering about how the Siamese network in Caffe updates their shared weights. To be specific, if we have
input1 -> conv1(shared) -> output1
input2 -> conv1(shared) -> output2 ===> contrastive loss (from output1 and output2),
then, does Caffe just sums up the two gradients for conv1 from the first and second networks?
Thanks for your response in advance.
You are correct, the diffs (gradients) of shared weights (all parameters with the same name) are accumulated. Note that you can not use different learn rate multipliers (lr_mult) for shared weights. Other features like momentum and weight decay should work like expected.