I am new to CNTK, and using its awesome python API. I have problem figuring out how I may define a Recurrent Convolutional network layer since the Recurrence() seems to assume a regular network layer only.
Be more specific, I would like to have recurrence among convolutional layers.
Any pointer or even a simple example would be highly appreciated. Thank you.
There are two ways to do this in a meaningful way (i.e. without destroying the structure of natural images that convolutions rely on). The simplest is to just have an LSTM at the final layer i.e.
convnet = C.layers.Sequential([Convolution(...), MaxPooling(...), Convolution(...), ...])
z = C.layers.Sequential([convnet, C.layers.Recurrence(LSTM(100)), C.layers.Dense(10)])
for a 10-class problem.
The more complex way would be to define your own recurrent cell that only uses convolutions and thus respects the structure of natural images. To define a recurrent cell you need to write a function that takes the previous state and an input (i.e. a single frame if you are processing video) and outputs the next state and output. For example you can look into the implementation of the GRU in the CNTK layers module, and adapt it to use convolution
instead of times
everywhere. If this is what you want I can try to provide such an example. However, I encourage you to try the simple way first.
Update: I wrote a barebones convolutional GRU. You need to pay special attention to how the initial state is defined but otherwise it seems to work fine. Here's the layer definition
def ConvolutionalGRU(kernel_shape, outputs, activation=C.tanh, init=C.glorot_uniform(), init_bias=0, name=''):
conv_filter_shape = (outputs, C.InferredDimension) + kernel_shape
bias_shape = (outputs,1,1)
# parameters
bz = C.Parameter(bias_shape, init=init_bias, name='bz') # bias
br = C.Parameter(bias_shape, init=init_bias, name='br') # bias
bh = C.Parameter(bias_shape, init=init_bias, name='bc') # bias
Wz = C.Parameter(conv_filter_shape, init=init, name='Wz') # input
Wr = C.Parameter(conv_filter_shape, init=init, name='Wr') # input
Uz = C.Parameter(conv_filter_shape, init=init, name='Uz') # hidden-to-hidden
Ur = C.Parameter(conv_filter_shape, init=init, name='Hz') # hidden-to-hidden
Wh = C.Parameter(conv_filter_shape, init=init, name='Wc') # input
Uh = C.Parameter(conv_filter_shape, init=init, name='Hc') # hidden-to-hidden
# Convolutional GRU model function
def conv_gru(dh, x):
zt = C.sigmoid (bz + C.convolution(Wz, x) + C.convolution(Uz, dh)) # update gate z(t)
rt = C.sigmoid (br + C.convolution(Wr, x) + C.convolution(Ur, dh)) # reset gate r(t)
rs = dh * rt # hidden state after reset
ht = zt * dh + (1-zt) * activation(bh + C.convolution(Wh, x) + C.convolution(Uh, rs))
return ht
return conv_gru
and here is how to use it
x = C.sequence.input_variable(3,224,224))
z = C.layers.Recurrence(ConvolutionalGRU((3,3), 32), initial_state=C.constant(0, (32,224,224)))
y = z(x)
x0 = np.random.randn(16,3,224,224).astype('f') # a single seq. with 16 random "frames"
output = y.eval({x:x0})
output[0].shape
(16, 32, 224, 224)