Search code examples
pythonpandasdataframeazure-functionsazure-data-lake

ImportError: Install adlfs to access Azure Datalake Gen2 and Azure Blob Storage even after adlsf is installed


I have an azure function with code below:

storage_account_url = f"{self.datalake_settings.STORAGE_ENDPOINT}/{parquet_folder_path}/{file_name}.parquet"
storage_options = {
    "account_name": self.datalake_settings.STORAGE_ACCOUNT,
    "client_id": self.datalake_settings.RUNACCOUNT_ID,
    "client_secret": self.datalake_settings.RUNACCOUNT_KEY.get_secret_value(),
    "tenant_id": self.settings.TENANT_ID
}

df.to_parquet( storage_account_url, engine='pyarrow', compression='snappy', storage_options=storage_options )

This is my requirements.txt:

azure-functions
azure-identity
azure-storage-blob
azure-monitor-opentelemetry
opentelemetry-api
opentelemetry-sdk
opentelemetry-semantic-conventions
pydantic
adlfs
azure-storage-blob
azure-storage-file-datalake

This is my .venv/lib: enter image description here

When I run this code I get following error:

System.Private.CoreLib: Exception while executing function: Functions.get_exchangerates_trigger. System.Private.CoreLib: Result: Failure Exception: ImportError: Install adlfs to access Azure Datalake Gen2 and Azure Blob Storage

Any ideas how to troubleshoot this? It clearly looks like the adlfs and blobstorage packages are installed.


Solution

  • I found another approach that works:

        credential = ClientSecretCredential(
            tenant_id=self.settings.TENANT_ID,
            client_id=self.datalake_settings.RUNACCOUNT_ID,
            client_secret=self.datalake_settings.RUNACCOUNT_KEY.get_secret_value()
        )
        
        # Create blob service client
        account_url = f"https://{self.datalake_settings.STORAGE_ACCOUNT}.blob.core.windows.net"
        blob_service_client = BlobServiceClient(
            account_url=account_url,
            credential=credential
        )
        
        # Get container name from the EXTRACT_ROOT (assuming it's in format "container/path")
        container_name = "st-xx-lake-xxx-dev-ctn"
        
        # Get the blob path (everything after container name)
        blob_path = f"{parquet_folder_path}/{file_name}"
        
        # Get container client
        container_client = blob_service_client.get_container_client(container_name)
        
        # Write parquet to bytes buffer
        parquet_buffer = io.BytesIO()
        df.to_parquet(parquet_buffer, engine='pyarrow', compression='snappy')
        parquet_buffer.seek(0)
        
        # Upload the parquet file
        blob_client = container_client.upload_blob(
            name=blob_path,
            data=parquet_buffer,
            overwrite=True
        )