Search code examples
juliareinforcement-learning

Continuous action spaces in Reinforcement Learning - How does the agent choose action value from a continuous space?


I have been learning Reinforcement Learning for few days now, and I have seen example problems like Mountain Car problem and Cart Pole problem.

In these problems, the way action space is described is discrete. For example in Cart Pole Problem, the agent can either move left or move right.

  1. But the examples don't talk about how much? How does the agent decide how much to move left, how much to move right, after all these movements are continuous space actions. So I want to know how does the agent decide what real value to choose from a continuous action space.

  2. Also I have been using ReinforcementLearning.jl in Julia and wanted to know a way i could represent range constraints on action space in it. Example, the real value that the agent chooses as it's action should lie in a range like [10.00, 20.00[ for example. I want to know how this can be done.


Solution

    1. But the examples don't talk about how much? How does the agent decide how much to move left, how much to move right, after all these movements are continuous space actions. So I want to know how does the agent decide what real value to choose from a continuous action space.

    The common solution is to assume that the output of the agent follows the normal distribution. Then you only need to design an agent that predicts the mean and std. Finally sample a random action from that distribution and pass it to the environment.

    Another possible solution is to discretize the continuous action space and turn it into a discrete action space problem. Then randomly sample one action from the predicted bin.

    1. Also I have been using ReinforcementLearning.jl in Julia and wanted to know a way i could represent range constraints on action space in it. Example, the real value that the agent chooses as it's action should lie in a range like [10.00, 20.00[ for example. I want to know how this can be done.

    You can take a look at the implementation detail of the PendulumEnv. Currently, it uses .. from IntervalSets.jl to describe a continuous range.