Search code examples
pythoncntkrecurrent-neural-network

How to define a Recurrent Convolutional network layer in CNTK?


I am new to CNTK, and using its awesome python API. I have problem figuring out how I may define a Recurrent Convolutional network layer since the Recurrence() seems to assume a regular network layer only.

Be more specific, I would like to have recurrence among convolutional layers.

Any pointer or even a simple example would be highly appreciated. Thank you.


Solution

  • There are two ways to do this in a meaningful way (i.e. without destroying the structure of natural images that convolutions rely on). The simplest is to just have an LSTM at the final layer i.e.

    convnet = C.layers.Sequential([Convolution(...), MaxPooling(...), Convolution(...), ...])
    z = C.layers.Sequential([convnet, C.layers.Recurrence(LSTM(100)), C.layers.Dense(10)])
    

    for a 10-class problem.

    The more complex way would be to define your own recurrent cell that only uses convolutions and thus respects the structure of natural images. To define a recurrent cell you need to write a function that takes the previous state and an input (i.e. a single frame if you are processing video) and outputs the next state and output. For example you can look into the implementation of the GRU in the CNTK layers module, and adapt it to use convolution instead of times everywhere. If this is what you want I can try to provide such an example. However, I encourage you to try the simple way first.

    Update: I wrote a barebones convolutional GRU. You need to pay special attention to how the initial state is defined but otherwise it seems to work fine. Here's the layer definition

    def ConvolutionalGRU(kernel_shape, outputs, activation=C.tanh, init=C.glorot_uniform(), init_bias=0, name=''):
        conv_filter_shape = (outputs, C.InferredDimension) + kernel_shape
        bias_shape = (outputs,1,1)
        # parameters
        bz = C.Parameter(bias_shape, init=init_bias, name='bz')  # bias
        br = C.Parameter(bias_shape, init=init_bias, name='br')  # bias
        bh = C.Parameter(bias_shape, init=init_bias, name='bc')  # bias
        Wz = C.Parameter(conv_filter_shape, init=init, name='Wz') # input
        Wr = C.Parameter(conv_filter_shape, init=init, name='Wr') # input
        Uz = C.Parameter(conv_filter_shape, init=init, name='Uz') # hidden-to-hidden
        Ur = C.Parameter(conv_filter_shape, init=init, name='Hz') # hidden-to-hidden
        Wh = C.Parameter(conv_filter_shape, init=init, name='Wc') # input
        Uh = C.Parameter(conv_filter_shape, init=init, name='Hc') # hidden-to-hidden
        # Convolutional GRU model function
        def conv_gru(dh, x):
            zt = C.sigmoid (bz + C.convolution(Wz, x) + C.convolution(Uz, dh))        # update gate z(t)
            rt = C.sigmoid (br + C.convolution(Wr, x) + C.convolution(Ur, dh))        # reset gate r(t)
            rs = dh * rt                                                            # hidden state after reset
            ht = zt * dh + (1-zt) * activation(bh + C.convolution(Wh, x) + C.convolution(Uh, rs))
            return ht
        return conv_gru
    

    and here is how to use it

    x = C.sequence.input_variable(3,224,224))
    z = C.layers.Recurrence(ConvolutionalGRU((3,3), 32), initial_state=C.constant(0, (32,224,224)))
    y = z(x)
    x0 = np.random.randn(16,3,224,224).astype('f') # a single seq. with 16 random "frames"
    output = y.eval({x:x0}) 
    output[0].shape
    (16, 32, 224, 224)