Search code examples
pythonopenai-gym

How can I start the environment from a custom initial state for Mountain Car?


I want to start the continuous Mountain Car environment of OpenAI Gym from a custom initial point. The OpenAI Gym does not provide any method to do that. I looked into the code of the environment and found out that there is an attribute state which holds the state information. I tried to manually change that attribute. However, it does not work.

You can see the attached code, the observations being returned from the state function does not match the env.state variable.

I think it is some basic Python issue, which is not allowing me to access the attribute. Is there any way to access that attribute or some other way to start from a custom initial state? I know I can create a custom environment (like this) from the existing code and add the functionality too. I found one issue at Github repo and I think they also suggested this.

import gym
env = gym.make("MountainCarContinuous-v0")

env.reset()
print(env.state)
env.state = np.array([-0.4, 0])
print(env.state)

for i in range(50):
    obs, _, _, _ = env.step([1]) # Just taking right in every step   
    print(obs, env.state) #the observation and env.state is different
    env.render()

The output of the code:

[-0.52196493  0.        ]
[-0.4  0. ]
[-0.52047719  0.00148775] [-0.4  0. ]
[-0.51751285  0.00296433] [-0.4  0. ]
[-0.51309416  0.00441869] [-0.4  0. ]
[-0.50725424  0.00583992] [-0.4  0. ]
...

Solution

  • You will have to unwrap the environment first to access all the attributes of the environment.

    import gym
    import numpy as np
    env = gym.make("MountainCarContinuous-v0")
    env = env.unwrapped # to access the inner functionalities of the class
    env.state = np.array([-0.4, 0])
    print(env.state)
    
    for i in range(50):
        obs, _, _, _ = env.step([1]) # Just taking right in every step   
        print(obs, env.state) #the observation and env.state are same
        env.render()
    
    

    Output:

    [-0.4  0. ]
    [-0.39940589  0.00059411] [-0.39940589  0.00059411]
    [-0.39822183  0.00118406] [-0.39822183  0.00118406]
    [-0.39645609  0.00176575] [-0.39645609  0.00176575]
    [-0.39412095  0.00233513] [-0.39412095  0.00233513]
    [-0.39123267  0.00288829] [-0.39123267  0.00288829]
    [-0.38781124  0.00342142] [-0.38781124  0.00342142]
    ...