tensorflow-serving google-cloud-ml tensorflow google-cloud-ml-engine

Running distributed Tensorflow on Google Cloud ML engine ClusterSpec

I am trying to run a large distributed tensorflow model on Google Cloud's ML engine and am having trouble understanding what should go on tf.train.ClusterSpec.

When you run a job on Google Cloud you can select the scale tier from BASIC, STANDARD_1, PREMIUM_1, BASIC_GPU or CUSTOM, each giving you access to different types of clusters. However, I can't find the name/addresses of the machines in these clusters.

Solution

Please take a look at the documentation and sample here. You should set ClusterSpec using the environment variable TF_CONFIG; e.g.

  tf_config = os.environ.get('TF_CONFIG')

  # If TF_CONFIG is not available run local
  if not tf_config:
    return run('', True, *args, **kwargs)

  tf_config_json = json.loads(tf_config)
  cluster = tf_config_json.get('cluster')
  ...
  cluster_spec = tf.train.ClusterSpec(cluster)

AttributeError: module 'tensorflow' has no attribute 'gfile'
Serving multiple tensorflow models using docker
How to secure TensorFlow Serving API
Using the Same Model with Multiple Tensorflow Serving Instances
Import cycle in tensorflow protobufs
Tensorflow Serving on docker using tflite file
TensorFlow serving S3 and Docker
How to name the outputs of a Keras Functional API model?
Maximum HTTP post size in Tensorflow Serving
tf.image.decode_jpeg - contents must be scalar, got shape [1]
SavedModel usage is just for Tensor serving or to retrain
tensorflow serving enable batch, { "error": "The second input must be a scalar, but it has shape [1] }
how can we specify version of tensorflow serving in kubeflow?
Tensorflow v2.10 mutate output of signature function to be a map of label to results
Does AWS Sagemaker supports gRPC prediction requests?
Failed to start server error when trying run docker using the tensorflow/serving image
TensorFlow Setting model_config_file runtime argument in YAML file for K8s
TensorFlow Serving: Update model_config (add additional models) at runtime
Is there a way to use make_tensor_proto without having to install the entire TensorFlow package?
How do I create a dockerfile and docker-compose.yml from the commands?
Output of model after serving different with keras model output
Tensorflow-serving client written in java is not giving correct results
RaggedTensor request to TensorFlow serving fails
modifying a tensorflow savedmodel pb file for inference with a custom op
How to send an image that was sent with a post request to a model for prediction
NodeDef mentions attr \'batch_dims\' not in Op in tensorflow serving
Tensorflow Serving error serviing tensorflow version 1.5 model
Can a "address family for nodename not supported" warning prevent proper serving?
Run docker container Error: Could not find base path /models/model for servable model
How to prepare warmup request file for tensorflow serving?