tensorflow neural-network recurrent-neural-network tflearn

Computing sum of sequence - Recurrent network

I have been trying to implement a recurrent network to compute the sum of a sequence of numbers. I plan to try to make it accept variable length sequences but to start off the input length is fixed at 5.

Example:

[1,2,3,4,5] = 15

The problem I am encountering is that once it converges, or at least the loss stabilizes, for any input I give it it gives the same output.

Example

[3,4,5,1,1] = 134.59681
[400,1,1,1,1] = 134.59681
[32,42,55,1,1] = 134.59681
[100,1,2,1,1] = 134.59681

So far I have tried different layer sizes, different activation functions and learning rate. But they all result in similar behavior. Even if the values they give as output changes (so instead of 134. it could be -12 or whatever), for any input its the same.

I assume its possible to solve this problem with a recurrent neural network, using linear activations.

Why does the network converge to a "fixed" value?

sample_size = 512
X = np.random.randint(1, 50, size=(sample_size, 5))
Y = [[np.sum(x)] for x in X]
X = np.reshape(X, (-1, 5, 1))

net = tflearn.input_data(shape=[None, 5, 1])
net = tflearn.lstm(net, 32, dropout=0.9)
net = tflearn.fully_connected(net, 1, activation='linear')

regression = tflearn.regression(net, optimizer='adam', loss='mean_square', learning_rate=1.)

m = tflearn.DNN(regression, tensorboard_dir='tnsbrd-logs/')
m.fit(X, Y, n_epoch=2000, show_metric=True, snapshot_epoch=False)

Solution

Using a simple_rnn instead of lstm layer solved the problem. Also I ended up using just one node in the recurrent layer. Since there is one input and one output, this made sense.

The code looks like this now:

sample_size = 64
max_len = 5

X = np.random.randint(1, 50, size=(sample_size, max_len)) + 0.0
Y = [[np.sum(x)] for x in X]
X = np.reshape(X, (-1, max_len, 1))


net = tflearn.input_data(shape=[None, max_len, 1])
net = tflearn.simple_rnn(net, 1, activation='linear', weights_init='normal')

regression = tflearn.regression(net, optimizer='adagrad', loss='mean_square', learning_rate=.06, metric='R2',)

m = tflearn.DNN(regression, tensorboard_dir='tnsbrd-logs/')
m.fit(X, Y, n_epoch=10000, show_metric=True, snapshot_epoch=False)