Sorry I am new to RNN. I have read this post on TimeDistributed layer.
I have reshaped my data in to Keras requried [samples, time_steps, features]
: [140*50*19]
, which means I have 140 data points, each has 50 time steps, and 19 features. My output is shaped [140*50*1]
. I care more about the last data point's accuracy. This is a regression problem.
My current code is :
x = Input((None, X_train.shape[-1]) , name='input')
lstm_kwargs = { 'dropout_W': 0.25, 'return_sequences': True, 'consume_less': 'gpu'}
lstm1 = LSTM(64, name='lstm1', **lstm_kwargs)(x)
output = Dense(1, activation='relu', name='output')(lstm1)
model = Model(input=x, output=output)
sgd = SGD(lr=0.00006, momentum=0.8, decay=0, nesterov=False)
optimizer = sgd
model.compile(optimizer=optimizer, loss='mean_squared_error')
My questions are:
return_sequences=True
? How about if I only need the last time step's prediction, it would be many-to-one. So I need to my output to be [140*1*1]
and return_sequences=False
? I have tried to use TimeDistributed layer as
output = TimeDistributed(Dense(1, activation='relu'), name='output')(lstm1)
the performance seems to be worse than without using TimeDistributed layer. Why is this so?
optimizer=RMSprop(lr=0.001)
. I thought RMSprop
is supposed to stabilize the NN. But I was never able to get good result using RMSprop
.lr
and momentum for SGD
? I have been testing on different combinations manually. Is there a cross validation method in keras? So:
return_sequences=False
makes your network to output only a last element of sequence prediction.RMSProp
as a first choice for RNNs is a rule of thumb - not a general proved law. Moreover - it is strongly adviced not to change it's parameters. So this might cause the problems. Another thing is that LSTM
needs a lot of time to stabalize. Maybe you need to leave it for more epochs. Last thing - is that maybe your data could favour another activation function. keras.sklearnWrapper
.