Search code examples
azureazure-blob-storagespark-shell

Write Data as Json from DataFrame to Azure Blob Storage


I have some data in dataframe which i have to convert to json and store it into Azure Blob Storage. Is there any way to achieve this? Below are the steps which i have tried. I am trying it from spark-shell.

val df = spark.sql("select * from historic_data.all_historic_data").show()

spark.conf.set("fs.azure.account.key.<STORAGE_ACCOUNT_NAME>.blob.core.windows.net","STORAGE_ACCOUNT_KEY")

df.write.mode(SaveMode.Append).json("wasbs://BlobStorageContainer@<STORAGE_ACCOUNT_NAME>.blob.core.windows.net/<FOLDER_PATH_OF BLOB>/")

While running the write command i am getting below error

org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: The specifed resource name contains invalid characters.
  at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:2208)
  at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatusInternal(NativeAzureFileSystem.java:2673)
  at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:2618)
  at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1448)
  at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:92

Is there anything i have missed while providing blob details? below is screen shot of my storage account :

enter image description here

I havent seen any similiar kind of question here which will WRITE from dataframe as Json into Azure Blob.


Solution

  • The specifed resource name contains invalid characters.

    Message is clear about the problem. You are using uppercase characters in the container name, which is not allowed, so change BlobStorageContainer to blobstoragecontainer.

    A full list of rules can be found here.