Search code examples
pythonkedromlops

Azure Data Lake Storage Gen2 (ADLS Gen2) as a data source for Kedro pipeline


According to Kedro's documentation, Azure Blob Storage is one of the available data sources. Does this extend to ADLS Gen2 ?

Haven't tried Kedro yet, but before I invest some time on it, I wanted to make sure I could connect to ADLS Gen2.

Thank you in advance !


Solution

  • Yes this works with Kedro. You're actually pointing a really old version of the docs, nowadays all filesystem based datasets in Kedro use fsspec under the hood which means they work with S3, HDFS, local and many more filesystems seamlessly.

    The ADLS Gen2 is supported by ffspec via the underlying adlfs library which is documented here.

    From a Kedro point of view all you need to do is declare your catalog entry like so:

     motorbikes:
         type: pandas.CSVDataSet
         filepath: abfs://your_bucket/data/02_intermediate/company/motorbikes.csv
         credentials: dev_az
    

    We also have more examples here, particularly example 15.