I am trying to stream a delta table from cluster A to Cluster B, but I am not able to load or write data to a different cluster:
streamingDf = spark.readStream.format("delta").option("ignoreChanges", "true") \
.load("hdfs://cluster_A/delta-table")
stream = streamingDf.writeStream.format("delta").option("checkpointLocation", "/tmp/checkpoint")\
.start("hdfs://cluster_B/delta-sink")
Then, I get the following error:
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block
So, my question is if it is posiible to stream data directly from two clusters using delta format, or additional technologies are requiered to achieve this.
Thanks!
The error was related to firewall rules, all the nodes in cluster A must have access to all the nodes in cluster B with the corresponding ports. I had only set the ports on the namenodes