Search code examples
azureazure-databricks

Azure: Create alert when Maximum number of Workers is reached


In Azure, I enabled all Diagnostic Logs of my Databricks Workspace. I looked at all table, especially DatabricksClusters and Usage however, I didn't find any entry that would help me to create an alert when the maximum number of workers is reached. I want to monitor databricks to find out when I have to increase the upper worker limit/SKU.


Solution

  • There are few approaches to that:

    1. Use diagnostic logs with Log Analytics. Diagnostic logs include cluster events from which we can use resize and resizeResult fields. The resize is primarily used by DLT pipelines, for all other clusters we need to use resizeResult event which includes clusterWorkers field with the number of workers allocated after resize. The main problem with this approach is that this event doesn't include the information about max_workers field, so you will need somehow join create and edit events to obtain max workers, but this could be problematic if changes to the cluster configuration were done a long time ago, and no information is kept in the log analytics.

    2. Recently Databricks started a public preview of so-called system tables that contains the same information as in the diagnostic logs (and more tables are coming), but it's stored for a longer time, so it's easier to join events like resizeResult with cluster information. Then you can use Databricks SQL Alerts to send notifications. You can find more information about system usage for notifications in the recent blog post that also contains reusable queries, etc.

    3. Setup project Overwatch that consolidates diagnostic logs + cluster logs + some other information to provide better insights into what happens in the workspace & individual clusters. But Overwatch is slowly being replaced by system tables.