Search code examples
apache-sparkazure-blob-storagespark-streamingazure-eventhub

Error in setting eventhubs.checkpoint.dir in spark streaming job from event hub


I am trying to access event hub data by running a spark streaming job locally. I faced an issue in setting the event hub configuration for eventhubs.checkpoint.dir . I tried setting below value

  • wasbs://container_name@storage_name.blob.core.windows.net/
  • https://container_name@storage_name.blob.core.windows.net/
  • https://storage_name.blob.core.windows.net/continer_name/

Each resulted in similar errors as the following one:

ERROR ReceiverTracker: Deregistered receiver for stream 0: Restarting receiver with delay 2000ms: Error handling message; restarting receiver -   java.io.IOException: No FileSystem for scheme: https
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)

Solution

  • You can set eventhubs.checkpoint.dir to a string value that would be a valid wasb folder name. For instance, I set it to "/myeventhubspark". The folder will be automatically created in the default container of your Spark cluster. Be sure to prepend the folder name with a forward-slash , like this -

    "eventhubs.checkpoint.dir" -> "/myeventhubspark"