Search code examples
reinforcement-learning

Learning agent in custom gymnasium enviroment with stable_baseline3 make change this envirment


I customize a gymnasium enviroment and train it with stable_baseline3. But leaning process change my enviroment.

>>>print(env.step(2))
(510, -0.1, False, False, {})
>>>model.learn(total_timesteps=10000)
>>>print(env.step(2))
(104, -0.1, False, False, {'TimeLimit.truncated': False})

For the same action = 2, enviroment after learning give me another observation. Do anyone know why? Thanks!


Solution

  • This depends on a number of factors, so I cannot say what it is. However, from your current setup, if you want to produce the same result from a step, you might want to reset your environment (and possibly with the same seed):

    # Initialize your environment
    env = gym.make(...) 
    
    # Reset your environment
    state, info = env.reset(seed = 42)
    env.step(2)
    >> (510, -0.1, False, False, {})
    
    # Learn
    model.learn(total_timesteps=10000)
    
    # Reset your environment again
    state, info = env.reset(seed = 42)
    env.step(2)
    >> (510, -0.1, False, False, {})
    

    In reinforcement learning terms, if there is an initial state distribution, i.e. there is more than one state that can be the starting state, you must use a seed if you want to reproduce the exact same results after the first step. If not, you do not have to use a seed necessarily.