python tensorflow random random-seed openai-gym

Tensorflow: Different results with the same random seed

I'm running a reinforcement learning program in a gym environment(BipedalWalker-v2) implemented in tensorflow. I've set the random seed of the environment, tensorflow and numpy manually as follows

os.environ['PYTHONHASHSEED']=str(42)
random.seed(42)
np.random.seed(42)
tf.set_random_seed(42)

env = gym.make('BipedalWalker-v2')
env.seed(0)

config = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
# run the graph with sess

However, I get different results every time I run my program (without changing any code). Why are the results not consistent and what should I do if I want to obtain the same result?

Update:

The only places that I can think of may introduce randomness (other than the neural networks) are

I use tf.truncated_normal to generate random noise epsilon so as to implement noisy layer
I use np.random.uniform to randomly select samples from replay buffer

I also spot that the scores I get are pretty consistent at the first 10 episodes, but then begin to differ. Other things such as losses also show a similar trend but are not the same in numeric.

Update 2

I've also set "PYTHONHASHSEED" and use single-thread CPU as @jaypops96 described, but still cannot reproduce the result. Code has been updated in the above code block

Solution

I suggest checking whether your TensorFlow graph contains nondeterministic operations. For example, reduce_sum before TensorFlow 1.2 was one such operation. These operations are nondeterministic because floating-point addition and multiplication are nonassociative (the order in which floating-point numbers are added or multiplied affects the result) and because such operations don't guarantee their inputs are added or multiplied in the same order every time. See also this question.

EDIT (Sep. 20, 2020): The GitHub repository framework-determinism has more information about sources of nondeterminism in machine learning frameworks, particularly TensorFlow.