According to How to you write polars data frames to Azure blob storage?,
we can write parquet using polars
directly on Azure Storage such as basic storage containers.
In my case I was required to write in Delta format, which stands on top of parquet, so I modified the code a bit since polars
also supports delta
import adlfs
import polars as pl
from azure.identity.aio import DefaultAzureCredential
# pdf: pl.DataFrame
# path: str
# account_name: str
# container_name: str
credential = DefaultAzureCredential()
fs = adlfs.AzureBlobFileSystem(account_name=account_name, credential=credential)
with fs.open(f"{container_name}/way/to/{path}", mode="wb") as f:
if path.endswith(".parquet"):
pdf.write_parquet(f)
else:
pdf.write_delta(f, mode="append")
Using this code, I was able to write on the Azure filesystem when I specified a path = path/to/1.parquet
but not path = path/to/delta_folder/
.
In the second case, my problem was only a 0 byte file was written to delta_folder
on Azure storage, f
being a file pointer.
What's more, If I just use the local filesystem using pdf.write_delta(path, mode="append")
it just works.
How can I modify my code to support writing recursively in the delta_folder/
in the cloud?
The issue is that delta wants a folder to write to (potentially) multiple files so fsspec
's model of opening one file at a time isn't going to work.
You'll need to do something like
credential = DefaultAzureCredential()
credentials_dict = {} #objectstore syntax see link below
if path.endswith(".parquet"):
with fs.open(f"{container_name}/way/to/{path}", mode="wb") as f:
pdf.write_parquet(f)
else:
pdf.write_delta(
f"abfs://{container_name}/way/to/",
mode="append",
storage_options = credentials_dict
)
See here for the key fields that are compatible with credentials_dict