Search code examples
pythonkedro

How do I add many CSV files to the catalog in Kedro?


I have hundreds of CSV files that I want to process similarly. For simplicity, we can assume that they are all in ./data/01_raw/ (like ./data/01_raw/1.csv, ./data/02_raw/2.csv) etc. I would much rather not give each file a different name and keep track of them individually when building my pipeline. I would like to know if there is any way to read all of them in bulk by specifying something in the catalog.yml file?


Solution

  • You are looking for PartitionedDataSet. In your example, the catalog.yml might look like this:

    my_partitioned_dataset:
      type: "PartitionedDataSet"
      path: "data/01_raw"
      dataset: "pandas.CSVDataSet"