python tensorflow keras recurrent-neural-network

Modeling Encoder-Decoder according to instructions from a paper

I am new to this field and I was reading a paper "Predicting citation counts based on deep neural network learning techniques". There the authors describe the code that they implemented if someone wants to reproduce the results. I tried to do this but I am not sure if I succeeded.

Here is their description:

-RNN module - SimpleRNN
-Output dimension of the encoder - 512
-The output layer - Dense layer
-Activation function - ReLU
-Overfitting prevention technique - Dropout with 0.2 rate
-Epochs - 100
Optimization algorithm - RMSProp
Learning rate - 10^{-5}
Batch size - 256

And here is my implementation. I am not sure if the model I created is sequence to sequence.

epocsh = 100
batch_size = 256
optimizer = keras.optimizers.RMSprop(lr=0.00001)
model =  keras.models.Sequential([
    keras.layers.SimpleRNN(512, input_shape=[X_train.shape[0], X_train.shape[1]],
                           activation='relu', return_sequences=True, dropout=0.2),
    keras.layers.Dense(9)
])

model.compile(loss='mse', optimizer=optimizer, metrics=[keras.metrics.RootMeanSquaredError()])

The summary of this model is:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
simple_rnn (SimpleRNN)       (None, 154521, 512)       266240    
_________________________________________________________________
dense (Dense)                (None, 154521, 9)         4617      
=================================================================
Total params: 270,857
Trainable params: 270,857
Non-trainable params: 0
_________________________________________________________________

Update: Is this maybe the correct way to formulate this?

encoder = keras.layers.SimpleRNN(512,
                                 input_shape=[X_train.shape[0], X_train.shape[1]],
                                 activation='relu',
                                 return_sequences=False,
                                 dropout=0.2)

decoder = keras.layers.SimpleRNN(512,
                                 input_shape=[X_train.shape[0], X_train.shape[1]],
                                 activation='relu',
                                 return_sequences=True,
                                 dropout=0.2)

output = keras.layers.Dense(9)(decoder)

This is the dataset that I am using.

year  venue  c1  c2  c3  c4  c5  c6  c7  c8  c9  c10  c11  c12  c13  c14
1989    234   0   1   2   3   4   5   5   5   5    8    8   10   11   12
1989    251   0   0   0   0   0   0   0   0   0    0    0    0    0    0
1990    346   0   0   0   0   0   0   0   0   0    0    0    0    0    0

I need to give as an input all the columns until c5, and try to predict the other c's (which are citation count for the upcoming years). Is this the right way to go forward?

Solution

Your model is token classification model not sequence-to-sequence.

Seq-2-seq model comprise of encoder and decoder (the both are RNN in your case). It can not be created with Sequentional API because there are separate inputs for encoder and decoder.

The encoder should be created with argument return_sequences=False.

Dense layer should follow the decoder.

It should be something like that:

encoder_input = Input(shape=(None, 512))
decoder_input = Input(shape=(None, 512))
encoder_output = keras.layers.SimpleRNN(512,
                                 activation='relu',
                                 return_sequences=False,
                                 dropout=0.2)(encoder_input)
encoder_output = encoder_output[:, tf.newaxis, ...]
decoder_inputs = tf.concat([encoder_output, decoder_input], 1)
decoder_output = keras.layers.SimpleRNN(512,
                                 activation='relu',
                                 return_sequences=True,
                                 dropout=0.2)(decoder_inputs)

output = keras.layers.Dense(9)(decoder_output)
model_att = tf.keras.models.Model([encoder_input, decoder_input], output )

model_att.compile(optimizer=ADAM, loss='sparse_categorical_crossentropy')

model_att.summary()