python deep-learning keras recurrent-neural-network sequential

Keras sequential timedistributed model extrem result differences between 2 and 3 sequences

I have 2 models whom I train on 2 basically the same selfmade datasets. One with sequence length 1 and one with sequence length 2. In the first case it conveges like a charm and practicaly figures out my generating procedure, the second case it does little better then chance. What do I do wrong? Anything could be helpfull.

data generating code

def make_other_date(samples = 720,sequence = 1, features =100):
    y_train = np.zeros((samples,sequence, 2))
    x_train = np.random.randint(2, size=(samples, sequence, features))
    for  i_sample in range(samples):
        for i_sequence in range(sequence):

                if np.sum(x_train[i_sample,i_sequence,:]) > 50:

                    y_train[i_sample,:,:] = np.array([0,1])
                else:
                    y_train[i_sample,:,:] = np.array([1,0])


    return x_train-0.5,y_train #-0.5 to make mean = 0 

nsequence = 1
x_train, y_train = make_other_date(36000,sequence = nsequence)
x_val, y_val = make_other_date(360,sequence = nsequence)
print(x_train.shape,y_train.shape)#(36000, 1, 100) (36000, 1, 2)

Model

model = Sequential()
model.add(TimeDistributed(Dense(10), batch_input_shape=(None,nsequence,100)))
model.add(TimeDistributed(Dense(10))) #unnessacery 
model.add(TimeDistributed(Dense(2)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
print (model.output_shape) #(None, 1, 2)

Results nsequence = 1

Epoch 10/10
28800/28800 [==============================] - 3s - loss: 3.4264e-05 - val_loss: 2.4744e-05

Results nsequence = 2

Epoch 10/10
28800/28800 [==============================] - 3s - loss: 0.6053 - val_loss: 0.6042

Solution

There is something wrong the formulation of the problem. I'm going to try to explain to you why your example cannot work and then you can make another one if you will.

On the data part, when you produce the dataset :

for i_sequence in range(sequence):
    if np.sum(x_train[i_sample,i_sequence,:]) > 50:
        y_train[i_sample,:,:] = np.array([0,1])
    else:
        y_train[i_sample,:,:] = np.array([1,0])

you define the target for the whole sequence only based on the last element of this sequence. The y_train[i_sample,0,:] will be overwritten by the last action of the loop since you update y_train[i_sample,:,:] everytime you go forward in the sequence.

So : you have ONE target for the whole sequence, which depends only on the last element of this sequence.

Now on the model part :

Your model is only constitued of TimeDistributed(Dense()) layers. As per definition, this is a wrapper that applies the same dense layer on every element of your sequence. Those dense layers share weights, so the one that will be applied on the first element of your sequence is exactly the same than the one applied on the last one.

Now if you think about it: to decide of the target to apply on the first element of your sequence, your network needs to know what's happening on the last element, since you defined the dataset this way.

Imagine that one of your sequence -call it seq_i- is such that

np.sum(x_train[seq_i,0,:]) = 52
np.sum(x_train[seq_i,1,:]) = 49

then your target for this sequence is

y_train[seq_i,0] = [1,0]
y_train[seq_i,1] = [1,0]

Suppose that the dense layer is capable of predicting target [1,0] if the input is < 50, just like you want it for the second element of your sequence. Since you apply the same layer to the first element of the sequence, it will predict [0,1] for that element and get punished for it during the training phase. It will go back and forth and won't learn anything.

Is it clear?