I customize a gymnasium enviroment and train it with stable_baseline3. But leaning process change my enviroment.
>>>print(env.step(2))
(510, -0.1, False, False, {})
>>>model.learn(total_timesteps=10000)
>>>print(env.step(2))
(104, -0.1, False, False, {'TimeLimit.truncated': False})
For the same action = 2, enviroment after learning give me another observation. Do anyone know why? Thanks!
This depends on a number of factors, so I cannot say what it is. However, from your current setup, if you want to produce the same result from a step, you might want to reset your environment (and possibly with the same seed):
# Initialize your environment
env = gym.make(...)
# Reset your environment
state, info = env.reset(seed = 42)
env.step(2)
>> (510, -0.1, False, False, {})
# Learn
model.learn(total_timesteps=10000)
# Reset your environment again
state, info = env.reset(seed = 42)
env.step(2)
>> (510, -0.1, False, False, {})
In reinforcement learning terms, if there is an initial state distribution, i.e. there is more than one state that can be the starting state, you must use a seed if you want to reproduce the exact same results after the first step. If not, you do not have to use a seed necessarily.