Search code examples
google-cloud-platformgoogle-cloud-dataprocstackdrivergoogle-cloud-stackdriver

GCP Dataproc: create cluster with stackdriver activated


Using GCP, I instantiate workflows for my processing. I'd like to activate Stackdriver logging to have more metrics (see https://cloud.google.com/dataproc/docs/guides/stackdriver-logging).

From documentation, I should set the property:

dataproc:dataproc.logging.stackdriver.job.driver.enable=true

My workflow template looks like:

placement:
  managedCluster:
    clusterName: my-cluster
    config:
      gceClusterConfig:
        zoneUri: europe-west1-d
      masterConfig:
        machineTypeUri: n1-standard-4
      workerConfig:
        machineTypeUri: n1-standard-4
        numInstances: 10

Where should I set this property?

Thx.


Solution

  • The below should work.

    Since the API hierarchy is deeply nested, you can build the initial template using gcloud dataproc workflow-templates interface, describe command will give you the correct YAML or JSON. You can then do fast iteration using instantiate-inline from the local file.

    placement:
      managedCluster:
        clusterName: my-cluster
        config:
          gceClusterConfig:
            zoneUri: europe-west1-d
          masterConfig:
            machineTypeUri: n1-standard-4
          workerConfig:
            machineTypeUri: n1-standard-4
            numInstances: 10
          softwareConfig:
            properties:
              dataproc:dataproc.logging.stackdriver.job.driver.enable: true