Search code examples
deep-learninglstmcntk

Seq2Seq in CNTK : Run Time Error Function Only supports 2 dynamic axis


I am trying to implement a basic translation model where input and output are sentences in different languages in CNTK using LSTMs.

enter image description here

To achieve this I am creating model as follows :

def create_model(x):
    with c.layers.default_options():
        m = c.layers.Recurrence(c.layers.LSTM(input_vocab_size))(x)
        m = sequence.last(m)
        y = c.layers.Recurrence(c.layers.LSTM(label_vocab_size))(m)
        return m

batch_axis = Axis.default_batch_axis()
input_seq_axis = Axis('inputAxis')
input_dynamic_axes = [batch_axis, input_seq_axis]
raw_input = input_variable(shape = (input_vocab_dim), dynamic_axes = input_dynamic_axes, name = 'raw_input')
z= create_model(raw_input)

But I am getting following error :

RuntimeError: Currently PastValue/FutureValue Function only supports input operand with 2 dynamic axis (1 sequence-axis and 1 batch-axis)

As per I understand, dynamic axis are basically those axis which gets decided after data gets loaded, in this case batch size and length of input sentence. I don't think I am changing the dynamic axis of input anywhere.

Any help is highly appreciated.


Solution

  • The last() operation strips the dynamic axis, since it reduces the input sequence to a single value (the thought vector).

    The thought vector should then become the initial state for the second recurrence. So it should not be passed as the data argument to the second recurrence.

    In the current version, the initial_state argument of Recurrence() cannot be data dependent. This will be soon possible, it is already under code review and will be merged to master soon.

    Until then, there is a more complicated way to pass a data-dependent initial state, where you manually construct the recurrence (without Recurrence() layer), and manually add the initial hidden state in the recurrence. It is illustrated in the sequence-2-sequence sample.