I've been trying to set up a proof of concept were Azure Databricks reads data from my Event Hub using the following code:
connectionString = "Endpoint=sb://mystream.servicebus.windows.net/;EntityPath=theeventhub;SharedAccessKeyName=event_test;SharedAccessKey=mykeygoeshere12345"
ehConf = {
'eventhubs.connectionString' : connectionString
}
df = spark \
.readStream \
.format("eventhubs") \
.options(**ehConf) \
.load()
readEventStream = df.withColumn("body", df["body"].cast("string"))
display(readEventStream)
I'm using the azure_eventhubs_spark_2_11_2_3_6.jar package as recommeneded here but i've tried the latest version and keep getting the message
ERROR : Some streams terminated before this command could finish!
I've used the databricks runtime version 6.1, and rolled it back to 5.3 but can't seem to get it up and running. I have a Python script that sends data to the event hub, I just can't see anything coming out of it? Is it the package? or something else I'm doing wrong?
Update: I was loading the library from a JAR file that I downloaded. I deleted that and then got it from the Maven repo. After testing it worked
It works perfectly with the below configuration:
Databrick Runtime: 5.5 LTS (includes Apache Spark 2.4.3, Scala 2.11)
Azure EventHub library: com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.13
Use above configuration, able to get stream the data from Azure Eventhubs.
Reference: Integrating Apache Spark with Azure Event Hubs
Hope this helps.