Search code examples
monitoringdatabricksspark-structured-streamingdatadog

Differentiate databricks streaming queries in datadog


I am trying to set up a dashboard on Datadog that will show me the streaming metrics for my streaming job. The job itself contains two tasks one task has 2 streaming queries and the other has 4 (Both tasks use the same cluster). I followed the instructions here to install Datadog on the driver node. However when I go to datadog and try to create a dashboard there is no way to differentiate between the 6 different streaming queries so they are all lumped together (none of the tags for the metrics are different per query).


Solution

  • After some digging I found there is an option you can enable via the init script called enable_query_name_tag which is disabled by default as it can cause there to be a ton of tags created when you are not using query names.

    The modification is shown here:

    instances:
        - spark_url: http://\$DB_DRIVER_IP:\$DB_DRIVER_PORT
          spark_cluster_mode: spark_standalone_mode
          cluster_name: \${hostip}
          streaming_metrics: true
          enable_query_name_tag: true <----