pandasscaladatabricksazure-databricks

How do you write a CSV back to Azure Blob Storage using Databricks?


I'm struggling to write back to an Azure Blob Storage Container. I'm able to read from a container using the following:

storage_account_name = "expstorage"
storage_account_key = "1VP89J..."
container = "source"

spark.conf.set("fs.azure.account.key.{0}.blob.core.windows.net".format(storage_account_name), storage_account_key)

dbutils.fs.ls("dbfs:/mnt/azurestorage")

I've tried multiple methods to write back to my container just doing a search, but I can't find a definitive way.

Here is a link to an alternative that uses a SAS key, but I didn't want to mix/match key types.

Write dataframe to blob using azure databricks


Solution

  • In order to write to your Blob Storage, you just need to specify the path, starting with dbfs:/mnt/azurestorage :

    df.write
     .mode("overwrite")
     .option("header", "true")
     .csv("dbfs:/mnt/azurestorage/filename.csv"))
    

    This will create a folder with distributed data. If you are looking for a single csv file, try this instead :

    df.toPandas().to_csv("dbfs:/mnt/azurestorage/filename.csv")
    

    If you are using pandas only, you will not have access to the dbfs api, so you need to use the local files API instead, which means your path has to start with /dbfs/ instead of dbfs:/ as follows :

    df.to_csv(r'/dbfs/mnt/azurestorage/filename.csv', index = False)