Search code examples
pythonmachine-learninggoogle-cloud-platformtpu

GCP not detecting correct python version when submitting ML training job


I am trying to submit a TPU ML training job on GCP using this:

> !gcloud ai-platform jobs submit training `whoami`_object_detection_`date +%s` \
> --job-dir=gs://dota-1/train \
> --packages dist/object_detection-0.1.tar.gz,slim/dist/slim 0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \
> --module-name object_detection.model_tpu_main \
> --runtime-version 2.6 \
> --scale-tier BASIC_TPU \
> --region us-central1 \
> -- \
> --model_dir=gs://dota-1/train \
> --tpu_zone us-central1 \
> --python-version 3.7 \
> --pipeline_config_path=gs://dota-1/data/pipeline.config

But it gives me the following error and does not detect the right python version:

ERROR: (gcloud.ai-platform.jobs.submit.training) INVALID_ARGUMENT: Field: runtime_version Error: The specified runtime version '2.6' with the Python version '' is not supported or is deprecated. Please specify a different runtime version. See https://cloud.google.com/ml-engine/docs/runtime-version-list for a list of supported versions.
- '@type': type.googleapis.com/google.rpc.BadRequest
  fieldViolations:
  - description: The specified runtime version '2.6' with the Python version '' is
      not supported or is deprecated. Please specify a different runtime version.
      See https://cloud.google.com/ml-engine/docs/runtime-version-list for a list
      of supported versions.
    field: runtime_version

I have run !python --version and confirmed that I have python 3.7 installed, which is the supported version by GCP.

How can I fix this?


Solution

  • This error is about the TensorFlow version 2.6. You could see these options:

    • Doesn’t support batch prediction. You could use versions 1.15 or 2.1.

    • Using a more recent version of TensorFlow than the latest supported runtime version on AI Platform Training is possible for training, but not for prediction.

      To use a version of TensorFlow that is not yet supported as a full AI Platform Training runtime version, include it as a custom dependency for your trainer using one of the following approaches:

      Specify the TensorFlow version in your setup.py file as a PyPI dependency. Include it in your list of required packages as follows:

    REQUIRED_PACKAGES = ['tensorflow>=2.6]

    You can see more documentation.

    • Change the runtime version editing this flag, but you must have installed the package:

    --runtime-version 1.9

    You could see more documentation about the supported version.