Search code examples
pythonparallel-processingopenai-gymray

Run openai-gym environment on parallel


The following code is excerpted from https://bair.berkeley.edu/blog/2018/01/09/ray/.

import gym

@ray.remote
class Simulator(object):
    def __init__(self):
        self.env = gym.make("Pong-v0")
        self.env.reset()

    def step(self, action):
        return self.env.step(action)

# Create a simulator, this will start a remote process that will run
# all methods for this actor.
simulator = Simulator.remote()

observations = []
for _ in range(4):
    # Take action 0 in the simulator. This call does not block and
    # it returns a future.
    observations.append(simulator.step.remote(0))

I feel very confused when I'm reading this code. Is this code really run on parallel? Based on my understanding, there is only one env, so the above code should take actions on a sequential order, i.e. actions are taken one by one. If that's the case, what's the point of doing something like the above?


Solution

  • You are correct, there is a single Simulator actor. The step method is invoked four times on the actor. This creates four tasks, which the actor will execute serially.

    If this is all that the application is doing, there is no advantage over creating a regular Python object and calling a method four times. However, this approach gives you the option of creating two Simulator actors and invoking methods on them in parallel. For example, you could write the following.

    # This assumes you've already called "import ray", "import gym",
    # "ray.init()", and defined the Simulator class from the original
    # post.
    
    # Create two simulators.
    simulator1 = Simulator.remote()
    simulator2 = Simulator.remote()
    
    # Step each of them four times.
    observation_ids1 = []
    observation_ids2 = []
    for _ in range(4):
        observation_ids1.append(simulator1.step.remote(0))
        observation_ids2.append(simulator2.step.remote(0))
    
    # Get the results.
    observations1 = ray.get(observation_ids1)
    observations2 = ray.get(observation_ids2)
    

    In this example, each simulator executes four tasks serially, but the two simulators are working in parallel. You can illustrate this by putting a time.sleep(1) statement in the step method and timing how long the overall computation takes.