tensorflow google-cloud-ml hyperparameters

Too many hyperparameter tuning metrics written out

Hyperparameter tuning job on Cloud ML Engine fails with the error message:

Too many hyperparameter tuning metrics were written by Hyperparameter Tuning Trial #...

How do I fix this?

Solution

First check to make sure you are indeed not writing out too many evaluation metrics. Are you specifying an appropriate throttle in the EvalSpec?

Secondly, check the loss metric. Is the loss metric in training the same as the loss metric in evaluation? If that's the case, your hyperparameter tuning job is getting confused by the training metrics.

The simplest fix is to define a new evaluation metric and use that metric ("rmse" in my example) as your hyperparameterTag.

Here's an example that shows both these fixes:

# create metric for hyperparameter tuning
def my_rmse(labels, predictions):
    pred_values = predictions['predictions']
    return {'rmse': tf.metrics.root_mean_squared_error(labels, pred_values)}


# Create estimator to train and evaluate
def train_and_evaluate(output_dir):

    estimator = tf.estimator.DNNLinearCombinedRegressor(...)

    estimator = tf.contrib.estimator.add_metrics(estimator, my_rmse)

    train_spec = ...
    exporter = ...
    eval_spec = tf.estimator.EvalSpec(
        input_fn = ...,
        start_delay_secs = 60, # start evaluating after N seconds
        throttle_secs = 300,  # evaluate every N seconds
        exporters = exporter)
    tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)