Search code examples
azureazure-blob-storageazure-databricks

Write databricks df to azure blob storage


I'm using Azure Databricks and I want a dataframe to be written to azure blob storage container. This is my current code.

spark.conf.set("fs.azure.account.key.sarcscdataplatform.dfs.core.windows.net", "<storage-account-key>")

source_table = "dbfs:/user/hive/warehouse/fan_enhanced"
destination_path = "abfss://gold-container@sarcscdataplatform.dfs.core.windows.net/output.csv"
dbutils.fs.cp(source_table, destination_path, recurse=True)

It creates the file but it is always empty while there is data in the dataframe. I look forward to everyones answers and thanks in advance!


Solution

  • You can try something like this:

    spark.conf.set("fs.azure.account.key.sarcscdataplatform.dfs.core.windows.net", "<storage-account-key>")
    
    output_container_path = ""abfss://gold-container@sarcscdataplatform.dfs.core.windows.net" % (output_container_name, storage_name)
    output_blob_folder = "%s/data_folder" % output_container_path
    
    
    # write the dataframe as a single file to blob storage
    (dataframe
     .coalesce(1)
     .write
     .mode("overwrite")
     .option("header", "true")
     .format("com.databricks.spark.csv")
     .save(output_blob_folder))