Search code examples
pythonapache-sparkgoogle-cloud-platformgoogle-cloud-dataprocgcp-ai-platform-notebook

How do I add machines to GCP AI Platform?


From this question's advice, I have been running a Python app that uses Tensorflow to run simulations and outputs results to a csv file, using AI Platform. I have been using Jupyter following this.

Works great, and I have increased my VM's size to run it faster.

Now how do I add machines to make it run even faster, maybe using Spark and/or Dataproc or, ideally, something simpler?


Solution

  • AI Platform notebooks are based on a single machine. To use a cluster of computers to process the data you can use a Jupyter notebook on Dataproc. To have this auto-configured, use a cluster similar to:

    REGION=<gce_region>
    gcloud beta dataproc clusters create ${CLUSTER_NAME} \ 
      --region ${REGION} \
      --optional-components ANACONDA,JUPYTER \
      --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/tony/tony.sh \
      --enable-component-gateway
    

    This will provide a Spark cluster that has a Jupyter notebook configured and a framework for running Tensorflow on a cluster (Tony).

    For more on Dataproc notebooks check out: https://medium.com/google-cloud/apache-spark-and-jupyter-notebooks-made-easy-with-dataproc-component-gateway-fa91d48d6a5a

    And for more on Tony, check out this post.

    If your looking for more of a serverless approach, you could also check out AI Platform distributed training: