Search code examples
databricksspark-structured-streamingdatabricks-workflows

How Deploy Spark Structured Streaming on Databricks for Production


On Databricks I created a notebook with Structured Streaming jobs. Is it o.k. to keep them running in the notebook or do they have to be deployed anywhere else in a productive setup.


Solution

  • It's completely ok to have the streaming job running in production as notebook. But when your job start to be more complex, with many transformations, then you may need to think how to modularize and make testable it either by splitting it into multiple notebooks and using %run to include them, or by using Databricks Repos + Python modules.

    P.S. You can find example of notebooks testing in the following repository.