why do keras-rl examples always choose linear activation in the output layer?

I'm a complete newbie to Reinforcement Learning. And I have a question about the choice of the activation function of the output layer for the keras-rl agents. In all the examples provided by keras-rl (https://github.com/matthiasplappert/keras-rl/tree/master/examples) choose linear activation function in the output layer. Why is this? What effect would we expect if I go with different activation function? For example, if I work with an OpenAI environment with a discrete action space of 5, should I also consider using softmax in the output layer for an agent? Thanks much in advance.

Solution

For some of the agents in keras-rl linear activation function is used, even though the agents are working with discrete action spaces (for example, dqn, ddqn). But, for example, CEM uses softmax activation function for discrete action spaces (which is what one would expect).

The reason behind linear activation function for dqn and ddqn is its exploration policy, which is a part of the agent. If we consider the class of exploration policy used for both of them as an example and a method select_action, we will see the following:

class BoltzmannQPolicy(Policy):
def __init__(self, tau=1., clip=(-500., 500.)):
    super(BoltzmannQPolicy, self).__init__()
    self.tau = tau
    self.clip = clip

def select_action(self, q_values):
    assert q_values.ndim == 1
    q_values = q_values.astype('float64')
    nb_actions = q_values.shape[0]

    exp_values = np.exp(np.clip(q_values / self.tau, self.clip[0], self.clip[1]))
    probs = exp_values / np.sum(exp_values)
    action = np.random.choice(range(nb_actions), p=probs)
    return action

In decision-making process for every action, output of linear activation function of the last dense layer is transformed according to Boltzmann exploration policy to range [0,1], and the decision on a specific action is made according to Boltzmann exploration. That's why softmax is not used in an output layer.

You can read more about different exploration strategies and their comparison here: https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-7-action-selection-strategies-for-exploration-d3a97b7cceaf