Search code examples
google-cloud-platformgoogle-hadoop

Multiple Hadoop clusters in one Google Cloud project


Is it possible to deploy several Hadoop clusters in one Google Cloud project?


Solution

  • Using bdutil you can deploy arbitrarily many different Hadoop clusters in a single Google project, as long as you've obtained sufficient Google Compute Engine quota to do so. The instructions here describe the usage of bdutil, but in short, cluster names in bdutil are simply distinguished by the PREFIX variable or --prefix flag when using bdutil. It's up to you to keep track of the zone and numbers of workers in each bdutil cluster.

    For easily keeping track of multiple clusters, it's highly recommended to use bdutil's generate_config command. For example, suppose you want 3 clusters: test, staging and prod. And perhaps they're different sizes and in different zones. You'll want to run something like:

    ./bdutil --prefix my-test-cluster -n 2 -z us-central1-f -b test-bucket  \
        generate_config test-cluster_env.sh
    
    ./bdutil --prefix my-staging-cluster -n 5 -z us-central1-b -b staging-bucket  \
        generate_config staging-cluster_env.sh
    
    ./bdutil --prefix my-prod-cluster -n 10 -z us-central1-f -b prod-bucket  \
        generate_config prod-cluster_env.sh
    

    Once you've done that, the files test-cluster_env.sh, staging-cluster_env.sh and prod-cluster_env.sh can be used to refer to your three different clusters from now on. For example, suppose you want to delete your test cluster:

    ./bdutil -e test-cluster_env.sh delete
    

    Or just deploy your prod cluster:

    ./bdutil -e prod-cluster_env.sh deploy
    

    Or to SSH into the master of your staging cluster:

    ./bdutil -e staging-cluster_env.sh shell
    

    When you do it this way, you can store your *_cluster_env.sh files in source control, and they'll be backwards compatible whenever you upgrade bdutil with new Google releases.

    If you need to customize bdutil more extensively, you may want to consider obtaining bdutil from GitHub directly using:

    git clone https://github.com/GoogleCloudPlatform/bdutil.git
    

    So that you can use git to update to fresh versions of bdutil periodically while letting git resolve any merge conflicts with any customizations.