How to define a Recurrent Convolutional network layer in CNTK?

I am new to CNTK, and using its awesome python API. I have problem figuring out how I may define a Recurrent Convolutional network layer since the Recurrence() seems to assume a regular network layer only.

Be more specific, I would like to have recurrence among convolutional layers.

Any pointer or even a simple example would be highly appreciated. Thank you.

Solution

There are two ways to do this in a meaningful way (i.e. without destroying the structure of natural images that convolutions rely on). The simplest is to just have an LSTM at the final layer i.e.

convnet = C.layers.Sequential([Convolution(...), MaxPooling(...), Convolution(...), ...])
z = C.layers.Sequential([convnet, C.layers.Recurrence(LSTM(100)), C.layers.Dense(10)])

for a 10-class problem.

The more complex way would be to define your own recurrent cell that only uses convolutions and thus respects the structure of natural images. To define a recurrent cell you need to write a function that takes the previous state and an input (i.e. a single frame if you are processing video) and outputs the next state and output. For example you can look into the implementation of the GRU in the CNTK layers module, and adapt it to use convolution instead of times everywhere. If this is what you want I can try to provide such an example. However, I encourage you to try the simple way first.

Update: I wrote a barebones convolutional GRU. You need to pay special attention to how the initial state is defined but otherwise it seems to work fine. Here's the layer definition

def ConvolutionalGRU(kernel_shape, outputs, activation=C.tanh, init=C.glorot_uniform(), init_bias=0, name=''):
    conv_filter_shape = (outputs, C.InferredDimension) + kernel_shape
    bias_shape = (outputs,1,1)
    # parameters
    bz = C.Parameter(bias_shape, init=init_bias, name='bz')  # bias
    br = C.Parameter(bias_shape, init=init_bias, name='br')  # bias
    bh = C.Parameter(bias_shape, init=init_bias, name='bc')  # bias
    Wz = C.Parameter(conv_filter_shape, init=init, name='Wz') # input
    Wr = C.Parameter(conv_filter_shape, init=init, name='Wr') # input
    Uz = C.Parameter(conv_filter_shape, init=init, name='Uz') # hidden-to-hidden
    Ur = C.Parameter(conv_filter_shape, init=init, name='Hz') # hidden-to-hidden
    Wh = C.Parameter(conv_filter_shape, init=init, name='Wc') # input
    Uh = C.Parameter(conv_filter_shape, init=init, name='Hc') # hidden-to-hidden
    # Convolutional GRU model function
    def conv_gru(dh, x):
        zt = C.sigmoid (bz + C.convolution(Wz, x) + C.convolution(Uz, dh))        # update gate z(t)
        rt = C.sigmoid (br + C.convolution(Wr, x) + C.convolution(Ur, dh))        # reset gate r(t)
        rs = dh * rt                                                            # hidden state after reset
        ht = zt * dh + (1-zt) * activation(bh + C.convolution(Wh, x) + C.convolution(Uh, rs))
        return ht
    return conv_gru

and here is how to use it

x = C.sequence.input_variable(3,224,224))
z = C.layers.Recurrence(ConvolutionalGRU((3,3), 32), initial_state=C.constant(0, (32,224,224)))
y = z(x)
x0 = np.random.randn(16,3,224,224).astype('f') # a single seq. with 16 random "frames"
output = y.eval({x:x0}) 
output[0].shape
(16, 32, 224, 224)