python apache-spark google-cloud-platform google-cloud-dataproc gcp-ai-platform-notebook

How do I add machines to GCP AI Platform?

From this question's advice, I have been running a Python app that uses Tensorflow to run simulations and outputs results to a csv file, using AI Platform. I have been using Jupyter following this.

Works great, and I have increased my VM's size to run it faster.

Now how do I add machines to make it run even faster, maybe using Spark and/or Dataproc or, ideally, something simpler?

Solution

AI Platform notebooks are based on a single machine. To use a cluster of computers to process the data you can use a Jupyter notebook on Dataproc. To have this auto-configured, use a cluster similar to:

REGION=<gce_region>
gcloud beta dataproc clusters create ${CLUSTER_NAME} \ 
  --region ${REGION} \
  --optional-components ANACONDA,JUPYTER \
  --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/tony/tony.sh \
  --enable-component-gateway

This will provide a Spark cluster that has a Jupyter notebook configured and a framework for running Tensorflow on a cluster (Tony).

For more on Dataproc notebooks check out: https://medium.com/google-cloud/apache-spark-and-jupyter-notebooks-made-easy-with-dataproc-component-gateway-fa91d48d6a5a

And for more on Tony, check out this post.

If your looking for more of a serverless approach, you could also check out AI Platform distributed training: