Search code examples
pythonreinforcement-learning

reinforcement learning - number of actions


Reading https://towardsdatascience.com/reinforcement-learning-temporal-difference-sarsa-q-learning-expected-sarsa-on-python-9fecfda7467e epsilon_greedy is defined as :

def epsilon_greedy(Q, epsilon, n_actions, s, train=False):
    """
    @param Q Q values state x action -> value
    @param epsilon for exploration
    @param s number of states
    @param train if true then no random actions selected
    """
    if train or np.random.rand() < epsilon:
        action = np.argmax(Q[s, :])
    else:
        action = np.random.randint(0, n_actions)
    return action

Is the parameter n_actions the number of actions available to an agent ? So if an agent is learning to play football and the actions available are {kick, don't kick} n_actions = 2


Solution

  • Yes, you are right. Usually you define a dictionary containing a map between integers and every action your agent can make. You can see that in the function n_actions is used exactly to sample a random action index when you don't select the optimal one.