Search code examples
djangoazuredjango-rest-frameworkfilesystemsazure-blob-storage

How to authenticate fsspec for azure blob storage


From a django REST API view I am trying to access a file that is stored in an azure storage blob. I would like to open it without downloading it into a file, as shown here. Read access is sufficient.

For this I scetched out my view like so:

import os
from fsspec.implementations.http import HTTPFileSystem

@api_view()
def my_view(request):
    url = "https://storageaccount.blob.core.windows.net/container/"
    filename = "file.f"
    fs = HTTPFileSystem(
        container_name=os.environ["AZURE_STORAGE_CONTAINER"],
        storage_options={
            "account_name": os.environ["AZURE_STORAGE_ACCOUNT"],
            "account_key": os.environ["AZURE_STORAGE_KEY"],
        },
    )
    with fs.open(url + filename, "r") as fobj:
        ds = somehow.open_dataset(fobj)

    return Response({"message": "Data manipulated"}, status=200)

This gives a FileNotFoundError.

My questions are:

  • Is this even possible with azure blob storage? If not, what would be the closest thing?
  • How would I authenticate the HTTPFileSystem? I feel like I more or less made those keywords up but wasn't able to find any information about it...

Solution

  • It also took us a while to figure out how to access Azure Blob Storage from fsspec, so documenting it here.

    In the Azure portal, at the storage account level (not the container level), we clicked on "Access Keys" in the "Network+security" section, and created an account_key and connection_string.

    We created a $HOME/.env file with these key pair values:

    account_key=xxxxxx
    connection_string=xxxxxxx
    

    Then in Python, we did:

    import os
    from dotenv import load_dotenv
    import fsspec
    
    load_dotenv()
    
    storage_options = {'connection_string':os.environ['connection_string'], 
                       'account_key':os.environ['account_key']}
    
    fs = fsspec.filesystem('abfs',**storage_options)
    
    url = 'abfs://my-blob/my_object'
    fs.info(url)