I am using azure-storage-file-datalake package to connect with ADLS gen2
from azure.identity import ClientSecretCredential
# service principal credential
tenant_id = 'xxxxxxx'
client_id = 'xxxxxxxxx'
client_secret = 'xxxxxxxx'
storage_account_name = 'xxxxxxxx'
credential = ClientSecretCredential(tenant_id, client_id, client_secret)
service_client = DataLakeServiceClient(account_url="{}://{}.dfs.core.windows.net".format(
"https", storage_account_name), credential=credential) # I have also tried blob instead of dfs in account_url
Folder structure in ADLS gen2 from where I have to read parquet file look like this. Inside container of ADLS gen2 we folder_a which contain folder_b in which there is parquet file.
folder_a
|-folder_b
parquet_file1
from gen1 storage we used to read parquet file like this.
from azure.datalake.store import lib
from azure.datalake.store.core import AzureDLFileSystem
import pyarrow.parquet as pq
adls = lib.auth(tenant_id=directory_id,
client_id=app_id,
client_secret=app_key)
adl = AzureDLFileSystem(adls, store_name=adls_name)
f = adl.open(file, 'rb') # 'file is parquet file with path of parquet file folder_a/folder_b/parquet_file1'
table = pq.read_table(f)
How do we proceed with gen2 storage, we are stuck at this point
http://peter-hoffmann.com/2020/azure-data-lake-storage-gen-2-with-python.html is the link that we have followed.
Note - We are not using databrick to do this
Regarding the issue, please refer to the following code
from azure.identity import ClientSecretCredential
from azure.storage.filedatalake import DataLakeServiceClient
import pyarrow.parquet as pq
import io
client_id = ''
client_secret = ''
tenant_id = ''
credential = ClientSecretCredential(tenant_id, client_id, client_secret)
storage_account_name = 'testadls05'
service_client = DataLakeServiceClient(account_url="{}://{}.dfs.core.windows.net".format(
"https", storage_account_name), credential=credential)
file_system = '<container name>'
file_system_client = service_client.get_file_system_client(file_system)
file_path = ''
file_client = file_system_client.get_file_client(file_path)
data = file_client.download_file(0)
with io.BytesIO() as b:
data.readinto(b)
table = pq.read_table(b)
print(table)