Search code examples
pysparkdatabricksazure-databricksspark-structured-streamingazure-storage-queues

Databricks: Azure Queue Storage structured streaming key not found error


I am trying to write ETL pipeline for AQS streaming data. Here is my code

CONN_STR = dbutils.secrets.get(scope="kvscope", key = "AZURE-STORAGE-CONN-STR")

schema = StructType([
    StructField("id", IntegerType()),
    StructField("parkingId", IntegerType()),
    StructField("capacity", IntegerType()),
    StructField("freePlaces", IntegerType()),
    StructField("insertTime", TimestampType())
  ])

stream = spark.readStream \
.format("abs-aqs") \
.option("fileFormat", "json") \
.option("queueName", "freeparkingplaces") \
.option("connectionString", CONN_STR) \
.schema(schema) \
.load()

display(stream)

When I run this I am getting java.util.NoSuchElementException: key not found: eventType

Here is how my queue looks like display_1

Can you spot and explain me what is the problem?


Solution

  • The abs-aqs connector isn’t for consumption of data from AQS, but it’s for getting data about new files in the blob storage using events reported to AQS. That’s why you’re specifying the the file format option, and schema - but these parameters will be applied to the files, not messages in AQS.

    As far as I know (I could be wrong), there is no Spark connector for AQS, and it’s usually recommended to use EventHubs or Kafka as messaging solution.