When I repeatedly run tf.estimator.LinearRegressor
the results are slightly different each time. I'm guessing that's because of the shuffle=True
here:
input_fn = tf.estimator.inputs.numpy_input_fn(
{"x": x_train}, y_train, batch_size=4, num_epochs=None, shuffle=True)
Which is fine as far as it goes, but when I try to make it deterministic by seeding the random number generators in both np
and tf
:
np.random.seed(1)
tf.set_random_seed(1)
The results are still slightly different each time. What am I missing?
tf.set_random_seed
sets the graph-level seed, but it's not the only source of randomness, because there is also an operation-level seed, which needs to be provided for each op.
Unfortunately, tf.estimator.inputs.numpy_input_fn
does not provide the seed
argument along with shuffle
to pass them to the underlying ops (source code). As a result, _enqueue_data
function always gets the seed=None
, which will reset any seed you set in advance. By the way, it's interesting to note that many of the underlying feed functions use standard python random.seed
for shuffle, not tensorflow random (see _ArrayFeedFn
, _OrderedDictNumpyFeedFn
, etc).
Summary: currently there's no way to guarantee stable execution with shuffle=True
, at least with the current API. The only option you have is to shuffle the data yourself and pass shuffle=False
.