Search code examples

Different clusters Spark Structured Streaming from delta file on cluster A to cluster B

I am trying to stream a delta table from cluster A to Cluster B, but I am not able to load or write data to a different cluster:

streamingDf = spark.readStream.format("delta").option("ignoreChanges", "true") \

stream = streamingDf.writeStream.format("delta").option("checkpointLocation", "/tmp/checkpoint")\

Then, I get the following error:
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block

So, my question is if it is posiible to stream data directly from two clusters using delta format, or additional technologies are requiered to achieve this.



  • The error was related to firewall rules, all the nodes in cluster A must have access to all the nodes in cluster B with the corresponding ports. I had only set the ports on the namenodes