deep-learning reinforcement-learning openai-gym monte-carlo-tree-search

How to restore previous state to gym environment

I'm trying to implement MCTS on Openai's atari gym environments, which requires the ability to plan: acting in the environment and restoring it to a previous state. I read that this can be done with the ram version of the games:

recording the current state in a snapshot: snapshot = env.ale.cloneState()

restoring the environment to a specific state recorded in snapshot: env.ale.restoreState(snapshot)

so I tried using the ram version of breakout:

env = gym.make("Breakout-ram-v0")
env.reset()

print("initial_state:")
plt.imshow(env.render('rgb_array'))
env.close()

# create first snapshot
snap0 = env.ale.cloneState()

executing the code above shows the image of the start of the game. We recorded the first state with snap0. Now let's play until the end:

while True:
    #is_done = env.ale.act(env.action_space.sample())[2]
    r = env.ale.act(env.action_space.sample())
    is_done = env.ale.game_over()
    if is_done:
        print("Whoops! We died!")
        break

print("final state:")
plt.imshow(env.render('rgb_array'))

executing the code above shows the image of the end of the game. now let's load the first state again to the environment:

env.ale.restoreState(snap0)
print("\n\nAfter loading snapshot")
plt.imshow(env.render('rgb_array'))

Instead of showing me the image of the start of the game, it shows me the same image of the end of the game. The environment is not reverting back even though I loaded the original first state.

If anyone got to work with ale and recording these kind of states, I'd really appreciate the help in figuring out what am I doing wrong. Thanks!

Solution

For anyone who comes across this in the future: There IS a bug in the arcade learning environment (ale) in the atari gym. The bug is in the original code written in C. restoring the original state from a snapshot changes the entire state back to the original, WITHOUT changing back the observation's picture or ram. Still, if you make another action after restoring the last state you get the next state with a correct image and ram. So basically if you don't need to draw images from the game, or save the ram of a specific state, You can play with restore without any problem. If you do need to see the image or ram of a current state, to use for a learning algorithm, then this is a problem. You need to save and remember the correct image when cloning, and using that saved image after restoring the state, instead of the image you get from getScreenRGB() after using the restoreState() function.