I have an azure function with code below:
storage_account_url = f"{self.datalake_settings.STORAGE_ENDPOINT}/{parquet_folder_path}/{file_name}.parquet"
storage_options = {
"account_name": self.datalake_settings.STORAGE_ACCOUNT,
"client_id": self.datalake_settings.RUNACCOUNT_ID,
"client_secret": self.datalake_settings.RUNACCOUNT_KEY.get_secret_value(),
"tenant_id": self.settings.TENANT_ID
}
df.to_parquet( storage_account_url, engine='pyarrow', compression='snappy', storage_options=storage_options )
This is my requirements.txt:
azure-functions
azure-identity
azure-storage-blob
azure-monitor-opentelemetry
opentelemetry-api
opentelemetry-sdk
opentelemetry-semantic-conventions
pydantic
adlfs
azure-storage-blob
azure-storage-file-datalake
When I run this code I get following error:
System.Private.CoreLib: Exception while executing function: Functions.get_exchangerates_trigger. System.Private.CoreLib: Result: Failure Exception: ImportError: Install adlfs to access Azure Datalake Gen2 and Azure Blob Storage
Any ideas how to troubleshoot this? It clearly looks like the adlfs and blobstorage packages are installed.
I found another approach that works:
credential = ClientSecretCredential(
tenant_id=self.settings.TENANT_ID,
client_id=self.datalake_settings.RUNACCOUNT_ID,
client_secret=self.datalake_settings.RUNACCOUNT_KEY.get_secret_value()
)
# Create blob service client
account_url = f"https://{self.datalake_settings.STORAGE_ACCOUNT}.blob.core.windows.net"
blob_service_client = BlobServiceClient(
account_url=account_url,
credential=credential
)
# Get container name from the EXTRACT_ROOT (assuming it's in format "container/path")
container_name = "st-xx-lake-xxx-dev-ctn"
# Get the blob path (everything after container name)
blob_path = f"{parquet_folder_path}/{file_name}"
# Get container client
container_client = blob_service_client.get_container_client(container_name)
# Write parquet to bytes buffer
parquet_buffer = io.BytesIO()
df.to_parquet(parquet_buffer, engine='pyarrow', compression='snappy')
parquet_buffer.seek(0)
# Upload the parquet file
blob_client = container_client.upload_blob(
name=blob_path,
data=parquet_buffer,
overwrite=True
)