How does the dimensions work when training a keras model?


q_values.shape <class 'tuple'>: (1, 1, 10)
(len(state_batch), self.nb_actions) <class 'tuple'>: (1, 10)

which is from the keras-rl library of the sarsa agent:


    batch = self.process_state_batch(state_batch)
    q_values = self.model.predict_on_batch(batch)
    assert q_values.shape == (len(state_batch), self.nb_actions)

Here is my code:

class MyEnv(Env):

    def __init__(self):

    def _reset(self) -> None:
        self.i = 0

    def _get_obs(self) -> List[float]:
        return [1] * 20

    def reset(self) -> List[float]:
        return self._get_obs()

    model = Sequential()
    model.add(Dense(units=20, activation='relu', input_shape=(1, 20)))
    model.add(Dense(units=10, activation='softmax'))

    policy = BoltzmannQPolicy()
    agent = SARSAAgent(model=model, nb_actions=10, policy=policy)

    optimizer = Adam(lr=1e-3)
    agent.compile(optimizer, metrics=['mae'])

    env = MyEnv(), 1, verbose=2, visualize=True)

Was wondering if someone can explain to me how the dimensions should be set up and how it works with the libraries? I'm putting in a list of 20 inputs, and want an output of 10.


  • Custom environment

    Let first build a simple toy environment first

    1. Its is a 1D maze : [1,1,0,1,1,0,1,1,0]
    2. 1: Stepping into this block of maze will get a reward of 1
    3. 0: Stepping into this block of maze will result in death with 0 reward
    4. Allowed actions 0: Move to next block of maze, 1: Hop over then next block, i.e skip the next and move to the one next to the next block of maze

    To implement our env in gym we need to implement 2 methods

    • step: Takes in a actions and performs the step and returns the state after step take, reward and a bool representing if the game has ended or not
    • reset: Reset the game and return the current state (initial state)

    Env Code

    class FooEnv(gym.Env):
        def __init__(self):
            self.maze = [1,1,0,1,1,0,1,1,0]
            self.curr_state = 0
            self.action_space = spaces.Discrete(2)
            self.observation_space = spaces.Discrete(1)
        def step(self, action):        
            if action == 0:
                self.curr_state += 1
            if action == 1:
                self.curr_state += 2
            if self.curr_state >= len(self.maze):
                reward = 0.
                done = True
                if self.maze[self.curr_state] == 0:
                    reward = 0.
                    done = True
                    reward = 1.
                    done = False
            return np.array(self.curr_state), reward, done, {}
        def reset(self):
            self.curr_state = 0
            return np.array(self.curr_state)

    Neural Network

    Now given the current state we want NN to predict the action to be taken.

    • NN will take current sate which is a single number representing the current maze block we are in as input
    • NN will return one of the two possible actions 0 or `1

    NN Code

    model = Sequential()
    model.add(Dense(units=16, activation='relu', input_shape=(1,)))
    model.add(Dense(units=8, activation='relu'))
    model.add(Dense(units=2, activation='softmax'))

    Putting it together

    policy = BoltzmannQPolicy()
    agent = SARSAAgent(model=model, nb_actions=2, policy=policy)
    optimizer = Adam(lr=1e-3)
    agent.compile(optimizer, metrics=['acc'])
    env = FooEnv(), 10000, verbose=1, visualize=False)
    # Test the trained agent using
    # agent.test(env, nb_episodes=5, visualize=False)


    Training for 10000 steps ...
    Interval 1 (0 steps performed)
    10000/10000 [==============================] - 54s 5ms/step - reward: 0.6128
    done, took 53.519 seconds

    If your environment is a Grid (2D) say if size n X m then the input size of NN will be (n,m) like below and flatten it before passing to the Dense layers


