We need to transfer multiple JSON files from DBX (e.g. abfss://@.dfs.core.windows.net/folder1/json_files) via SFTP.
Is there any sample code/notebook with guidelines which we can follow for this task?
Tired below code:
df.write.format("com.springml.spark.sftp").\
option("host", hostname).\
option("username", "user").\
option("password", "password").\
option("fileType", "json").\
save("/ftp/files/sample.json")
And getting this error:
java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
You can use Spark File Transfer Library
to transfer the files.
com.github.arcizon:spark-filetransfer_2.12:0.3.0
for scala 2.12
First, install this library in your cluster.
Using this library you can read and write dataframe in different file formats. Below is the example of reading a text file.
df_txt = spark.read \
.format("filetransfer") \
.option("protocol", "sftp") \
.option("host", host) \
.option("port", "22") \
.option("username", username) \
.option("password", password) \
.option("fileFormat", "text") \
.load("/pub/example/readme.txt")
display(df_txt)
Similarly, you can write data to sftp
as below.
df = spark.read.json(adls_path)
df.write \
.format("filetransfer") \
.option("protocol", "sftp") \
.option("host", host) \
.option("port", "22") \
.option("username", username) \
.option("password", password) \
.option("fileFormat", "json") \
.save("data/upload/output/sample.json")
For more information refer this GitHub repo.