I am very new to Databricks. So, pardon me please. Here is my requiremnt
How can I perform this activity. I assume there is not one shot process. I was planning to create a notebook and run it via Azure Data Factory. Pump the data in Blob and then using .Net send it to Event Hub. But, from Azure Data Factory we can only run the Azure Databricks notebook not store anywhere
Azure Databricks do support Azure Event Hubs as source and sink. Understand Structured Streaming - it is a stream processing engine in Apache Spark (available in Azure Databricks as well)
Create a notebook to do all your transformation (join, aggregation...) - assuming you are doing batch write to azure event hub.
Using Scala
val connectionString = "Valid EventHubs connection string."
val ehWriteConf = EventHubsConf(connectionString)
df.select("body")
.write
.format("eventhubs")
.options(ehWriteConf.toMap)
.save()
Replace .write
to .writeStream
if your queries are streaming.
Using PySpark
ds = df \
.select("body") \
.writeStream \
.format("eventhubs") \
.options(**ehWriteConf) \
.option("checkpointLocation", "///output.txt") \
.start()
More things to consider when working with Azure Event Hubs is regarding partitions - it is optional, you can just send the body alone (which will do round robin model)