I use a convolutional neural network for a regression task (i.e. the final layer of the network has one neuron with linear activation) and this works fine (enough). When I try to use the exact same model packaged with tf.keras.estimator.model_to_estimator
, the estimator seems to fit but the training loss stops decreasing very soon. The final eval losses (after 4 epochs each) are about 0.4 (mean abs. error) for the bare keras model and about 2.5 (mean abs. error) for the estimator.
To demonstrate the issue, I apply my model in both bare and estimator-packaged form to the MNIST dataset (I know that MNIST is a classification task and it doesn't really make sense to approach it as a regression task. The example should still illustrate my point.)
I find it very surprising, that, when using the same way to package a classification neural network into an estimator, the bare keras model and its packaged estimator version perform equally well (classification case is not included in the example code below). The difference only occurs for the regression task. I expect I'm either missing something pretty basic or this behaviour is due to some bug in Tensorflow.
To make sure there are as few differences as possible between the input to the models I package MNIST as a tf.data.Dataset
and return it from an input function, which is passed to the estimator. For the bare Keras model, I use the same input function to obtain the tf.Data.dataset
and give it directly to the fit
function.
# python 3.6. Tested with tensorflow-gpu-1.14 and tensorflow-cpu-2.0
import tensorflow as tf
import numpy as np
def get_model(IM_WIDTH=28, num_color_channels=1):
"""Create a very simple convolutional neural network using a tf.keras Functional Model."""
input = tf.keras.Input(shape=(IM_WIDTH, IM_WIDTH, num_color_channels))
x = tf.keras.layers.Conv2D(32, 3, activation='relu')(input)
x = tf.keras.layers.MaxPooling2D(3)(x)
x = tf.keras.layers.Conv2D(64, 3, activation='relu')(x)
x = tf.keras.layers.MaxPooling2D(3)(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(64, activation='relu')(x)
output = tf.keras.layers.Dense(1, activation='linear')(x)
model = tf.keras.Model(inputs=[input], outputs=[output])
model.compile(optimizer='adam', loss="mae",
metrics=['mae'])
model.summary()
return model
def input_fun(train=True):
"""Load MNIST and return the training or test set as a tf.data.Dataset; Valid input function for tf.estimator"""
(train_images, train_labels), (eval_images, eval_labels) = tf.keras.datasets.mnist.load_data()
train_images = train_images.reshape((60_000, 28, 28, 1)).astype(np.float32) / 255.
eval_images = eval_images.reshape((10_000, 28, 28, 1)).astype(np.float32) / 255.
# train_labels = train_labels.astype(np.float32) # these two lines don't affect behaviour.
# eval_labels = eval_labels.astype(np.float32)
# For a neural network with one neuron in the final layer, it doesn't seem to matter if target data is float or int.
if train:
dataset = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
dataset = dataset.shuffle(buffer_size=100).repeat(None).batch(32).prefetch(1)
else:
dataset = tf.data.Dataset.from_tensor_slices((eval_images, eval_labels))
dataset = dataset.batch(32).prefetch(1) # note: prefetching does not affect behaviour
return dataset
model = get_model()
train_input_fn = lambda: input_fun(train=True)
eval_input_fn = lambda: input_fun(train=False)
NUM_EPOCHS, STEPS_PER_EPOCH = 4, 1875 # 1875 = number_of_train_images(=60.000) / batch_size(=32)
USE_ESTIMATOR = False # change this to compare model/estimator. Estimator performs much worse for no apparent reason
if USE_ESTIMATOR:
estimator = tf.keras.estimator.model_to_estimator(
keras_model=model, model_dir="model_directory",
config=tf.estimator.RunConfig(save_checkpoints_steps=200, save_summary_steps=200))
train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn, max_steps=STEPS_PER_EPOCH * NUM_EPOCHS)
eval_spec = tf.estimator.EvalSpec(input_fn=eval_input_fn, throttle_secs=0)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
print("Training complete. Evaluating Estimator:")
print(estimator.evaluate(eval_input_fn))
# final train loss with estimator: ~2.5 (mean abs. error).
else:
dataset = train_input_fn()
model.fit(dataset, steps_per_epoch=STEPS_PER_EPOCH, epochs=NUM_EPOCHS)
print("Training complete. Evaluating Keras model:")
print(model.evaluate(eval_input_fn()))
# final train loss with Keras model: ~0.4 (mean abs. error).
I made a bug report at https://github.com/tensorflow/tensorflow/issues/35833#issue-549185982
To avoid the discussion being split across websites I mark this topic as solved.