python-3.x reinforcement-learning openai-gym

Deep reinforcement learning with multiple "continuous actions"

Below is a high level diagram of how my Agent should look like in order to be able to interact with a custom gym environment I made.

States and actions

The environment has three states [s1, s2, s3] and six actions [a1, a2, a3, a4, a5, a6] states and actions can be any value between 0 and 1

Question:

Which algorithms are suitable for my problem ? I am aware that there are algorithms that are good at handling continuous action space like (DDPG, PPO, etc.) but I can't see how they might operate when they should output multiple actions at each time-step. Finally, are there any gym environments that have the described property (multiple actions) and are there any python implementations for solving those particular environments?

Solution

As you mentioned in your question, PPO, DDPG, TRPO, SAC, etc. are indeed suitable for handling continuous action spaces for reinforcement learning problems. These algorithms will give out a vector of size equal to your action dimension and each element in this vector will be a real number instead of a discrete value. Note that stochastic algorithms like PPO will give a multivariate probability distribution from which you sample the actions.

Most of the robotic environments in Mujoco-py, PyBullet, Robosuite, etc. are environment with multiple continuous action spaces. Here the action spaces can be of the form [torque_for_joint_1, torque_for_join_2, ..., torque_for_joint_n] where torque_for_joint_i can be a real valued number determining by how much would that joint move.

Regarding implementations for solving these environments, robosuite does offer sample solutions for benchmarking the environments with different algorithms. You could also look up stable-baselines or one of the standard RL libraries.