From this question's advice, I have been running a Python app that uses Tensorflow to run simulations and outputs results to a csv file, using AI Platform. I have been using Jupyter following this.
Works great, and I have increased my VM's size to run it faster.
Now how do I add machines to make it run even faster, maybe using Spark and/or Dataproc or, ideally, something simpler?
AI Platform notebooks are based on a single machine. To use a cluster of computers to process the data you can use a Jupyter notebook on Dataproc. To have this auto-configured, use a cluster similar to:
REGION=<gce_region>
gcloud beta dataproc clusters create ${CLUSTER_NAME} \
--region ${REGION} \
--optional-components ANACONDA,JUPYTER \
--initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/tony/tony.sh \
--enable-component-gateway
This will provide a Spark cluster that has a Jupyter notebook configured and a framework for running Tensorflow on a cluster (Tony).
For more on Dataproc notebooks check out: https://medium.com/google-cloud/apache-spark-and-jupyter-notebooks-made-easy-with-dataproc-component-gateway-fa91d48d6a5a
And for more on Tony, check out this post.
If your looking for more of a serverless approach, you could also check out AI Platform distributed training: