Search code examples
neural-networkdeep-learningcaffegradient-descentsoftmax

Gradient calculation for softmax version of triplet loss


I have been trying to implement the softmax version of the triplet loss in Caffe described in
Hoffer and Ailon, Deep Metric Learning Using Triplet Network, ICLR 2015.

I have tried this but I am finding it hard to calculate the gradient as the L2 in exponent is not squared.

Can someone please help me here?


Solution

  • Implementing the L2 norm using existing layers of caffe can save you all the hustle.

    Here's one way to compute ||x1-x2||_2 in caffe for "bottom"s x1 and x2 (assuming x1 and x2 are B-by-C blobs, computing B norms for C dimensional diffs)

    layer {
      name: "x1-x2"
      type: "Eltwise"
      bottom: "x1"
      bottom: "x1"
      top: "x1-x2"
      eltwise_param { 
        operation: SUM
        coeff: 1 coeff: -1
      }
    }
    layer {
      name: "sqr_norm"
      type: "Reduction"
      bottom: "x1-x2"
      top: "sqr_norm"
      reduction_param { operation: SUMSQ axis: 1 }
    }
    layer {
      name: "sqrt"
      type: "Power"
      bottom: "sqr_norm"
      top: "sqrt"
      power_param { power: 0.5 }
    }
    

    For the triplet loss defined in the paper, you need to compute L2 norm for x-x+ and for x-x-, concat these two blobs and feed the concat blob to a "Softmax" layer.
    No need for dirty gradient computations.