Trouble with creating a LSTM Layer with a one dimensional Sensor Input

I am trying to replace my FFNN with a LSTM Layer. As input I get 360 Lidar Data Point and 4 additional values for distance etc.. The algorithm shall learn to navigate a robot through it's environment. With the FFNN it's working absolutely fine and for the LSTM I started like that:

# collected data for RL
scan_range = [] #filled with .append, length=360
state = scan_range + [heading, current_distance, obstacle_min_range, obstacle_angle]
return np.asarray(state)

Based on that data, there will be some analysis for the next state, if the goal is achieved etc. The data will be stored in memory: agent.appendMemory(state, action, reward, next_state, done) which will do: self.memory.append((state, action, reward, next_state, done). The action and reward are normal numbers and next_state is an array again.

Next, I build up the neural network with the LSTM Layer

model = Sequential()
model.add(SimpleRNN(64, input_shape=(1,364))) 
model.add(Dense(self.action_size, kernel_initializer='lecun_uniform'))
model.add(Activation('linear'))
model.compile(loss='mse', optimizer=RMSprop(lr=self.learning_rate, rho=0.9, epsilon=1e-06))
model.summary()

It is then trained everything using a batch like the following for the FFNN:

    def trainModel(self, target=False):
        mini_batch = random.sample(self.memory, self.batch_size)
        X_batch = np.empty((0, self.state_size), dtype=np.float64)
        Y_batch = np.empty((0, self.action_size), dtype=np.float64)

        for i in range(self.batch_size):
            states = mini_batch[i][0]
            actions = mini_batch[i][1]
            rewards = mini_batch[i][2]
            next_states = mini_batch[i][3]
            dones = mini_batch[i][4]

            q_value = self.model.predict(states.reshape((1, len(states))))
            self.q_value = q_value

            if target:
                next_target = self.target_model.predict(next_states.reshape((1, len(next_states))))

            else:
                next_target = self.model.predict(next_states.reshape((1, len(next_states))))

            next_q_value = self.getQvalue(rewards, next_target, dones)

            X_batch = np.append(X_batch, np.array([states.copy()]), axis=0)
            Y_sample = q_value.copy()

            Y_sample[0][actions] = next_q_value
            Y_batch = np.append(Y_batch, np.array([Y_sample[0]]), axis=0)

            if dones:
                X_batch = np.append(X_batch, np.array([next_states.copy()]), axis=0)
                Y_batch = np.append(Y_batch, np.array([[rewards] * self.action_size]), axis=0)
            print X_batch.shape
            print Y_batch.shape
        self.model.fit(X_batch, Y_batch, batch_size=self.batch_size, epochs=1, verbose=0)

When I don't change the code, I sure get the error of dimension: expected simple_rnn_1_input to have 3 dimensions, but got array with shape (1, 364) because the input is still two dimensional and the LSTM need three dimensions. I then tried to add the third dimension manually to just see if everything works fine:

mini_batch = random.sample(self.memory, self.batch_size)
        X_batch = np.empty((0, self.state_size), dtype=np.float64)
        Y_batch = np.empty((0, self.action_size), dtype=np.float64)
        Z_batch = np.empty((0, 1), dtype=np.float64)

        for i in range(self.batch_size):
            states = mini_batch[i][0]
            actions = mini_batch[i][1]
            rewards = mini_batch[i][2]
            next_states = mini_batch[i][3]
            dones = mini_batch[i][4]

            q_value = self.model.predict(states.reshape((1, len(states))))
            self.q_value = q_value

            if target:
                next_target = self.target_model.predict(next_states.reshape((1,1, len(next_states))))

            else:
                next_target = self.model.predict(next_states.reshape((1,1, len(next_states))))

            next_q_value = self.getQvalue(rewards, next_target, dones)

            X_batch = np.append(X_batch, np.array([states.copy()]), axis=0)
            Y_sample = q_value.copy()

            Y_sample[0][actions] = next_q_value
            Y_batch = np.append(Y_batch, np.array([Y_sample[0]]), axis=0)
            Z_batch = np.append(Z_batch, np.array([[1]]), axis=0)

            if dones:
                X_batch = np.append(X_batch, np.array([next_states.copy()]), axis=0)
                Y_batch = np.append(Y_batch, np.array([[rewards] * self.action_size]), axis=0)
                Z_batch = np.append(Z_batch, np.array([[1]]), axis=0)

        self.model.fit(X_batch, Y_batch, Z_batch, batch_size=self.batch_size, epochs=1, verbose=0)

When I do this, the .fit() gives the following error: TypeError: fit() got multiple values for keyword argument 'batch_size' My question is now, if .fit() is suited for the LSTM framework in this case? In the documentation, only x and z are given. Z seems useless in this case, but still the LSTM needs a 3 dimensions as input. Also my question is, if I want to use the LSTM framework properly and not with dummies, I have to use more than the actual state? Can I then, i.e., just append together the last 10 states so that states.shape=(10,1,364), is that a good timestep range or should it be longer? Kind regards!

Solution

I believe your basic issue is that the 3rd dimension needs to be added to X_batch, and not another component in model.fit.

In particular, Keras models don't usually specify the "batch"/"sample" dimension in the model layers. It is automatically inferred from the shape of the X_batch input data. In your case, you have an SimpleRNN with input_shape=(1,364) as the first layer. What Keras interprets this to mean is that the input data X_batch should have shape like this:

(num_samples, 1, 364).

Also, if you want to create a sequence of timesteps, you would provide X_batch with the following shape:

(num_samples, num_timesteps, 364) or something similar.

This page has some good discussion: https://keras.io/getting-started/sequential-model-guide/ for example, search for "Stacked LSTM for sequence classification" to help illustrate (although be careful of the return_sequences=True - for a single LSTM, you probably want return_sequences=False.)

I hope this helps.