tensorflow neural-network deep-learning caffe gradient-descent

Tensorflow, How can I compute backward pass for a given forward function

I want to construct an L2-norm layer in Caffe style (well, I actually want to use Tensorflow in a pycaffe layer, since using CUDA to write .cu files in Caffe is an onerous task.)

Forward pass:
- input(x): n-D array
- output(y): n-D array that has the same shape of input
- operation:

y = x / sqrt(sum(x^2,axis=(0,1))) # channel wise L2 normalization

class L2NormLayer:
    def __init__(self):
        self.eps = 1e-12
        self.sess = tf.Session()

    def forward(self, in_x):
        self.x = tf.constant(in_x)
        self.xp2 = tf.pow(self.x, 2)
        self.sum_xp2 = tf.reduce_sum(self.xp2, axis=(0, 1))
        self.sqrt_sum_xp2 = tf.sqrt(self.sum_xp2 + self.eps)  
        self.hat = tf.div(self.x, self.sqrt_sum_xp2)

        return self.sess.run(self.hat)

    def backward(self, dl):
        # 'dl' is loss calculated at upper layer (chain rule)
        # how do I calculate this gradient automatically using Tensorflow

        # hand-craft backward version
        loss = tf.constant(dl)
        d_x1 = tf.div(loss, self.sqrt_sum_xp2)
        d_sqrt_sum_xp2 = tf.div(-tf.reduce_sum(self.x * dl, axis=(0, 1)), (self.eps + tf.pow(self.sqrt_sum_xp2, 2)))
        d_sum_xp2 = tf.div(d_sqrt_sum_xp2, (self.eps + 2 * tf.sqrt(self.sum_xp2)))
        d_xp2 = tf.ones_like(self.xp2) * d_sum_xp2
        d_x2 = 2 * self.x * d_xp2
        d_x = d_x1 + d_x2

        return self.sess.run(d_x)

As commented in the code, how can I calcualte the gradient of the forward pass function by using Tensorflow automatically?

Solution

I think your best strategy would be to use existing caffe layers to achieve your goal.
First, use "Reduction" layer to compute the sq. L2 norm of x:

layer {
  name: "norm_x_sq"
  type: "Reduction"
  bottom: "x"
  top: "norm_x_sq"
  reduction_param { operation: SUMSQ axis: 1 }
}

Use "Power" layer to take the square root of the norm and compute its reciprocal:

layer {
  name: "norm_x-1"
  type: "Power"
  bottom: "norm_x_sq"
  top: "norm_x-1"
  power_param { power: -0.5 }
}

Once you have the denominator, you need to "Tile" it back to the same shape as x:

layer {
  name: "denom"
  type: "Tile"
  bottom: "norm_x-1"
  top: "denom"
  tile_param { axis:1 tiles: N } # here you'll have to manually put the target dimension N
}

Finally, use "Eltwise" layer to normalize x:

layer {
  name: "x_norm"
  type: "Eltwise"
  bottom: "x"
  bottom: "denom"
  top: "x_norm"
  eltwise_param { operation: PROD }
}

Some additional notes:
1. Dividing by the norm might be numerically unstable if the norm is very little. You might want to consider adding a tiny constant to "norm_x_sq" before taking the reciprocal of the square root. You can do that using existing layers as well.
2. This example showed how to normalize according to axis=1 dimension. Depending how your vectors are arranged in the blob, you might be able to use "Scale" layer for the division instead of tile+eltwise.
3. You might also find this thread useful.