Search code examples
tensorflowgoogle-cloud-mlhyperparameters

Too many hyperparameter tuning metrics written out


Hyperparameter tuning job on Cloud ML Engine fails with the error message:

Too many hyperparameter tuning metrics were written by Hyperparameter Tuning Trial #...

How do I fix this?


Solution

  • First check to make sure you are indeed not writing out too many evaluation metrics. Are you specifying an appropriate throttle in the EvalSpec?

    Secondly, check the loss metric. Is the loss metric in training the same as the loss metric in evaluation? If that's the case, your hyperparameter tuning job is getting confused by the training metrics.

    The simplest fix is to define a new evaluation metric and use that metric ("rmse" in my example) as your hyperparameterTag.

    Here's an example that shows both these fixes:

    # create metric for hyperparameter tuning
    def my_rmse(labels, predictions):
        pred_values = predictions['predictions']
        return {'rmse': tf.metrics.root_mean_squared_error(labels, pred_values)}
    
    
    # Create estimator to train and evaluate
    def train_and_evaluate(output_dir):
    
        estimator = tf.estimator.DNNLinearCombinedRegressor(...)
    
        estimator = tf.contrib.estimator.add_metrics(estimator, my_rmse)
    
        train_spec = ...
        exporter = ...
        eval_spec = tf.estimator.EvalSpec(
            input_fn = ...,
            start_delay_secs = 60, # start evaluating after N seconds
            throttle_secs = 300,  # evaluate every N seconds
            exporters = exporter)
        tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)