Search code examples
tensorflownestedgradientbackpropagationmap-function

Backpropagating gradients through nested tf.map_fn


I would like to map a TensorFlow function on each vector corresponding to the depth channel of every pixel in a matrix with dimension [batch_size, H, W, n_channels].

In other words, for every image of size H x W that I have in the batch:

  1. I extract some features maps F_k (whose number is n_channels) with the same size H x W (hence, the features maps all together are a tensor of shape [H, W, n_channels];
  2. then, I wish to apply a custom function to the vector v_ij that is associated with the i-th row and j-th column of each feature map F_k, but explores the depth channel in its entirety (e.g. v has dimension [1 x 1 x n_channels]). Ideally, all of this would happen in parallel.

A picture to explain the process can be found below. The only difference with the picture is that both input and output "receptive fields" have size 1x1 (apply the function to each pixel independently).

enter image description here

This would be similar to applying a 1x1 convolution to the matrix; however, I need to apply a more general function over the depth channel, rather than a simple sum operation.

I think tf.map_fn() could be an option and I tried the following solution, where I recursively use tf.map_fn() to access the features associated with each pixel. However, this kind of seems sub-optimal, and most importantly it raises an error when trying to backpropagate the gradients.

Do you have any idea of the reason why this happens and how I should structure my code to avoid the error?

This is my current implementation of the function:

import tensorflow as tf
from tensorflow import layers


def apply_function_on_pixel_features(incoming):
    # at first the input is [None, W, H, n_channels]
    if len(incoming.get_shape()) > 1:
        return tf.map_fn(lambda x: apply_function_on_pixel_features(x), incoming)
    else:
        # here the input is [n_channels]
        # apply some function that applies a transfomration and returns a vetor of the same size
        output = my_custom_fun(incoming) # my_custom_fun() doesn't change the shape
        return output

and the body of my code:

H = 128
W = 132
n_channels = 8

x1 = tf.placeholder(tf.float32, [None, H, W, 1])
x2 = layers.conv2d(x1, filters=n_channels, kernel_size=3, padding='same')

# now apply a function to the features vector associated to each pixel
x3 = apply_function_on_pixel_features(x2)  
x4 = tf.nn.softmax(x3)

loss = cross_entropy(x4, labels)
optimizer = tf.train.AdamOptimizer(lr)
train_op = optimizer.minimize(loss)  # <--- ERROR HERE!

Particularly, the error is the following:

File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2481, in AddOp
    self._AddOpInternal(op)

File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2509, in _AddOpInternal
    self._MaybeAddControlDependency(op)
File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2547, in _MaybeAddControlDependency
    op._add_control_input(self.GetControlPivot().op)

AttributeError: 'NoneType' object has no attribute 'op'

The whole error stack and the code can be found here. Thanks for the help,

G.


Update:

Following @thushv89 suggestion, I added a possible solution to the problem. I still don't know why my previous code didn't work. Any insight on this would still be very appreciated.


Solution

  • @gabriele regarding having to depend on batch_size, have you tried doing it the following way? This function does not depend on batch_size. You can replace the map_fn with anything you like.

    def apply_function_on_pixel_features(incoming):
    
        # get input shape:
        _, W, H, C = incoming.get_shape().as_list()
        incoming_flat = tf.reshape(incoming, shape=[-1, C])
    
        # apply function on every vector of shape [1, C]
        out_matrix = tf.map_fn(lambda x: x+1, incoming_flat)  # dimension remains unchanged
    
        # go back to the input shape shape [None, W, H, C]
        out_matrix = tf.reshape(out_matrix, shape=[-1, W, H, C])
    
        return out_matrix
    

    The full code of what I tested is as below.

    import numpy as np
    import tensorflow as tf
    from tensorflow.keras.losses import categorical_crossentropy
    
    def apply_function_on_pixel_features(incoming):
    
        # get input shape:
        _, W, H, C = incoming.get_shape().as_list()
        incoming_flat = tf.reshape(incoming, shape=[-1])
    
        # apply function on every vector of shape [1, C]
        out_matrix = tf.map_fn(lambda x: x+1, incoming_flat)  # dimension remains unchanged
    
        # go back to the input shape shape [None, W, H, C]
        out_matrix = tf.reshape(out_matrix, shape=[-1, W, H, C])
    
        return out_matrix
    
    H = 32
    W = 32
    x1 = tf.placeholder(tf.float32, [None, H, W, 1])
    labels = tf.placeholder(tf.float32, [None, 10])
    x2 = tf.layers.conv2d(x1, filters=1, kernel_size=3, padding='same')
    
    # now apply a function to the features vector associated to each pixel
    x3 = apply_function_on_pixel_features(x2)  
    x4 = tf.layers.flatten(x3)
    x4 = tf.layers.dense(x4, units=10, activation='softmax')
    
    loss = categorical_crossentropy(labels, x4)
    optimizer = tf.train.AdamOptimizer(0.001)
    train_op = optimizer.minimize(loss)
    
    
    x = np.zeros(shape=(10, H, W, 1))
    y = np.random.choice([0,1], size=(10, 10))
    
    
    with tf.Session() as sess:
      tf.global_variables_initializer().run()
      sess.run(train_op, feed_dict={x1: x, labels:y})