Search code examples
apache-sparkapache-spark-sqldatabricksazure-databricksspark-structured-streaming

Serverless or Trigger based processing in Databricks spark application


I have been working on a use case where we receive messages in the event hub/kafka topic and the rate at which we receive is very less at the rate of 5 to 10 events per hour.

Is there any serverless compute available in Databricks (Not the SQL one), which will bring up the computing cluster to process those 4 to 5 events and go back to a dormant state?

I dont want to keep a streaming job up and running - 24x7 for just 4-5 events getting processed in an hour. Also, batch mode won't work - as it runs on a schedule. Is there a Databricks service/functionality which can come up from sleep, process those 4 or 5 events, and go back to sleep?


Solution

  • No, as of right now there is no serverless compute (outside of SQL & model serving) available for use by customers (but it's coming). Also, Databricks doesn't support trigger based on the data in message bus, it only support scheduled run or so-called file arrival triggers. But you can setup a EventHubs capture for your topic and this may trigger jobs when file(s) are created (there is a setting in file arrival trigger that could allow to limit a one job/hour).