python random tensorflow deep-learning tensorflow-estimator

tf.estimator shuffle - random seed?

When I repeatedly run tf.estimator.LinearRegressor the results are slightly different each time. I'm guessing that's because of the shuffle=True here:

input_fn = tf.estimator.inputs.numpy_input_fn(
    {"x": x_train}, y_train, batch_size=4, num_epochs=None, shuffle=True)

Which is fine as far as it goes, but when I try to make it deterministic by seeding the random number generators in both np and tf:

np.random.seed(1)
tf.set_random_seed(1)

The results are still slightly different each time. What am I missing?

Solution

tf.set_random_seed sets the graph-level seed, but it's not the only source of randomness, because there is also an operation-level seed, which needs to be provided for each op.

Unfortunately, tf.estimator.inputs.numpy_input_fn does not provide the seed argument along with shuffle to pass them to the underlying ops (source code). As a result, _enqueue_data function always gets the seed=None, which will reset any seed you set in advance. By the way, it's interesting to note that many of the underlying feed functions use standard python random.seed for shuffle, not tensorflow random (see _ArrayFeedFn, _OrderedDictNumpyFeedFn, etc).

Summary: currently there's no way to guarantee stable execution with shuffle=True, at least with the current API. The only option you have is to shuffle the data yourself and pass shuffle=False.