I'm running a reinforcement learning program in a gym environment(BipedalWalker-v2) implemented in tensorflow. I've set the random seed of the environment, tensorflow and numpy manually as follows
os.environ['PYTHONHASHSEED']=str(42)
random.seed(42)
np.random.seed(42)
tf.set_random_seed(42)
env = gym.make('BipedalWalker-v2')
env.seed(0)
config = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
# run the graph with sess
However, I get different results every time I run my program (without changing any code). Why are the results not consistent and what should I do if I want to obtain the same result?
The only places that I can think of may introduce randomness (other than the neural networks) are
tf.truncated_normal
to generate random noise epsilon
so as to implement noisy layernp.random.uniform
to randomly select samples from replay bufferI also spot that the scores I get are pretty consistent at the first 10 episodes, but then begin to differ. Other things such as losses also show a similar trend but are not the same in numeric.
I've also set "PYTHONHASHSEED" and use single-thread CPU as @jaypops96 described, but still cannot reproduce the result. Code has been updated in the above code block
I suggest checking whether your TensorFlow graph contains nondeterministic operations. For example, reduce_sum
before TensorFlow 1.2 was one such operation. These operations are nondeterministic because floating-point addition and multiplication are nonassociative (the order in which floating-point numbers are added or multiplied affects the result) and because such operations don't guarantee their inputs are added or multiplied in the same order every time. See also this question.
EDIT (Sep. 20, 2020): The GitHub repository framework-determinism
has more information about sources of nondeterminism in machine learning frameworks, particularly TensorFlow.