Search code examples
pythonazureazure-data-lakeazure-databricks

Azure data lake - read using Python


I am trying to read a file from Azure Data lake using Python in a Databricks notebook. this is the code I used,

from azure.storage.filedatalake import DataLakeFileClient

file = DataLakeFileClient.from_connection_string("DefaultEndpointsProtocol=https;AccountName=mydatalake;AccountKey=******;EndpointSuffix=core.windows.net",file_system_name="files", file_path="/2020/50002")

with open("./sample.txt", "wb") as my_file:
    download = file.download_file()
    content = download.readinto(my_file)
    print(content)

The output I get is 0. Can you some point what I am doing wrong. my expectation is to print the file content.


Solution

  • The from_connection_string method returns a DataLakeFileClient, you could not use it to download the file.

    If you want to download a file to local, you could refer to my below code.

    import os, uuid, sys
    from azure.storage.filedatalake import DataLakeServiceClient
    
    service_client = DataLakeServiceClient.from_connection_string("DefaultEndpointsProtocol=https;AccountName=***;AccountKey=*****;EndpointSuffix=core.windows.net")
    
    file_system_client = service_client.get_file_system_client(file_system="test")
    
    directory_client = file_system_client.get_directory_client("testdirectory")
    
    file_client = directory_client.get_file_client("test.txt")
    
    download=file_client.download_file()
    
    downloaded_bytes = download.readall()
    
    with open("./sample.txt", "wb") as my_file:
        my_file.write(downloaded_bytes)
        my_file.close()
    

    If you want more sample code, you could refer to this doc:Azure Data Lake Storage Gen2.