Search code examples
pythonmachine-learningtensorflowgoogle-cloud-ml-engine

Google Cloud ML Engine: Create model version failed


I have successfully trained a TensorForestEstimator on Google Cloud's ML Engine, but when I try to create a model version I get the following error:

Create Version failed. Bad model detected with error: "Error loading the model: Could not load model. "

I am deploying with tensorflow 1.3. The Experiment is configured as follows:

def get_experiment_fn(args):
    def _experiment(run_config, hparams):
        return Experiment(
            estimator=TensorForestEstimator(
                params=ForestHParams(
                    num_trees=args.num_trees,
                    max_nodes=10000,
                    min_split_samples=2,
                    num_features=8,
                    num_classes=args.num_projections,
                    regression=True
                ),
                model_dir=args.job_dir,
                graph_builder_class=RandomForestGraphs,
                config=run_config,
                keys_name=None,
                report_feature_importances=True
            ),
            train_input_fn=get_input_fn(
                project_name=args.project,
                data_location=args.train_data,
                dataset_size=args.train_size,
                batch_size=args.train_batch_size
            ),
            train_steps=args.train_steps,
            eval_input_fn=get_input_fn(
                project_name=args.project,
                data_location=args.eval_data,
                dataset_size=args.eval_size,
                batch_size=args.eval_batch_size
            ),
            eval_steps=args.eval_steps,
            eval_metrics=get_eval_metrics(),
            export_strategies=[
                make_export_strategy(
                    serving_input_fn,
                    default_output_alternative_key=None,
                    exports_to_keep=1
                )
            ]
        )
    return _experiment

What is the issue?


Solution

  • It looks like Google Cloud ML Engine only supports serving models produced using tensorflow 1.2.0 and below as of now. See here: https://cloud.google.com/ml-engine/docs/concepts/runtime-version-list

    Use --runtime-version 1.2 if possible. If you are using a feature specific to tensorflow 1.3, you will need to host your model using Flask on Google App Engine until ML Engine support for tensorflow 1.3 arrives.