apache-spark pyspark azure-databricks azure-eventhub

Streaming Failing From Azure Event Hub on Apache Spark Databricks java.lang.IllegalArgumentException: failed to parse 1

When I attempt to stream data from Azure Event Hub with the following query I get the error:

Stream stopped...
java.lang.IllegalArgumentException: failed to parse 1

My code is as follows:

streamingQuery = (
  df
  .writeStream
  .format("delta")
  .outputMode("append")
  .option("checkpointLocation", f"{location}/_checkpoints")
  .start(location)

The full error is as follows:

1643393895301Expected e.g. {"ehName":{"0":23,"1":-1},"ehNameB":{"0":-2}}
    at org.apache.spark.sql.eventhubs.JsonUtils$.partitionSeqNos(JsonUtils.scala:98)
    at org.apache.spark.sql.eventhubs.EventHubsSourceOffset$.apply(EventHubsSourceOffset.scala:61)
    at org.apache.spark.sql.eventhubs.EventHubsSource$$anon$1.deserialize(EventHubsSource.scala:139)
    at org.apache.spark.sql.eventhubs.EventHubsSource$$anon$1.deserialize(EventHubsSource.scala:115)
    at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.readBatchFile(HDFSMetadataLog.scala:242)
    at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.get(HDFSMetadataLog.scala:232)
    at org.apache.spark.sql.eventhubs.EventHubsSource.initialPartitionSeqNos$lzycompute(EventHubsSource.scala:170)
    at org.apache.spark.sql.eventhubs.EventHubsSource.initialPartitionSeqNos(EventHubsSource.scala:113)
    at org.apache.spark.sql.eventhubs.EventHubsSource.getBatch(EventHubsSource.scala:316)
    at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$3(MicroBatchExecution.scala:492)
    at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)

Does anyone have any thoughts on where I might be going wrong?

Solution

@Pattterson In my case the issue was resolved by changing the checkpoint location, because I was using the previous location to run my tests.