Search code examples
python-3.xreinforcement-learningopenai-gym

How to set a openai-gym environment start with a specific state not the `env.reset()`?


Today, when I was trying to implement an rl-agent under the environment openai-gym, I found a problem that it seemed that all agents are trained from the most initial state: env.reset(), i.e.

import gym

env = gym.make("CartPole-v0")
initial_observation = env.reset()  # <-- Note
done = False

while not done:
    action = env.action_space.sample()  
    next_observation, reward, done, info = env.step(action)

env.close()  # close the environment

So it is natural that the agent can behave down the route env.reset() -(action)-> next_state -(action)-> next_state -(action)-> ... -(action)-> done, this is an episode. But how can an agent start from a sepecific state like a middle state, then take an action from that state? For example, I sample an experience from the replay buffer, i.e. (s, a, r, ns, done), what if I want train the agent start directly from the state ns, and get an action with a Q-Network, then for an n-step steps forward. Something like that:

import gym

env = gym.make("CartPole-v0")
initial_observation = ns  # not env.reset() 
done = False

while not done:
    action = DQN(ns) 
    next_observation, reward, done, info = env.step(action)
    # n-step later or done is true, break

env.close()  # close the environment

But even though I set a variable initial_observation as ns, I think the agent or the env will not aware it at all. How can I tell the gym.env that I want set the initial observation as ns and let the agent know the specific start state, get continue train directly from that specific observation(get start with that specific environment)?


Solution

  • AFAIK, the current implementation of most OpenAI gym envs (including the CartPole-v0 you have used in your question) doesn't implement any mechanism to init the environment in a given state.

    However, it shouldn't be too complex to modify the CartPoleEnv.reset() method in order to accept an optional parameter that acts as initial state.