python tensorflow machine-learning tensorflow-estimator tf.keras

Why does this simple tf.keras model not train after converting to a tensorflow estimator?

I'm trying to convert a tf.keras model to a tensorflow estimator using tf.keras.estimator.model_to_estimator, but the resulting estimator doesn't appear to be trainable.

I've tried modelling y = (x_1 + x_2)/2 using both sequential and functional tf.keras API's, and while the tf.keras models work perfectly fine, neither work after converting to estimators. Using a tf.estimator.LinearRegressor with the same input functions does work, so I don't think the problem is with the input functions.

Here's a minimal working example for the sequentially defined tf.keras model:

import numpy as np
import tensorflow as tf
import functools

sample_size = 1000

x_train = np.random.randn(sample_size, 2).astype(np.float32)
y_train = np.mean(x_train, axis=1).astype(np.float32) 

x_test = np.random.randn(sample_size, 2).astype(np.float32)
y_test = np.mean(x_test, axis=1).astype(np.float32) 

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(1, input_shape=(2,), name="Prediction"))
adam = tf.keras.optimizers.Adam(lr=0.1)
model.compile(loss='MSE', optimizer=adam)
#model.fit(x=x_train, y=y_train, epochs=10, batch_size=64)  # This works

est = tf.keras.estimator.model_to_estimator(keras_model=model)

def train_input_fn(batch_size):
    dataset = tf.data.Dataset.from_tensor_slices(({"Prediction_input": x_train}, y_train))
    return dataset.shuffle(sample_size).batch(batch_size).repeat()

def eval_input_fn(batch_size):
    dataset = tf.data.Dataset.from_tensor_slices(({"Prediction_input": x_test}, y_test))
    return dataset.batch(batch_size)

est.train(input_fn=functools.partial(train_input_fn, 64), steps=10)

eval_metrics = est.evaluate(input_fn=functools.partial(eval_input_fn, 1))
print('Evaluation metrics:', eval_metrics)

The estimator is trained for 10 steps, which should be more than enough to bring the loss down. Increasing the number of steps makes no difference, as far as I can tell.

When I run this on tensorflow 1.5.0, I get a warning about calling reduce_mean with keep_dims being deprecated when the tf.keras model is compiled, but it trains perfectly well as is.

Is this a bug, or am I missing something?

Solution

It turns out all I needed to do was reshape the target to have shape (sample_size, 1), and increase the number of training steps. I'm still not sure what the estimator was doing when the target had shape (sample_size, ), or why this isn't a problem for the canned estimator, but at least I know how to avoid this.