How do I add many CSV files to the catalog in Kedro?

I have hundreds of CSV files that I want to process similarly. For simplicity, we can assume that they are all in ./data/01_raw/ (like ./data/01_raw/1.csv, ./data/02_raw/2.csv) etc. I would much rather not give each file a different name and keep track of them individually when building my pipeline. I would like to know if there is any way to read all of them in bulk by specifying something in the catalog.yml file?

Solution

You are looking for PartitionedDataSet. In your example, the catalog.yml might look like this:

my_partitioned_dataset:
  type: "PartitionedDataSet"
  path: "data/01_raw"
  dataset: "pandas.CSVDataSet"