Search code examples
kedro

dynamic parameters on datasets in Kedro


I would like to call an API to enrich an existing dataset.

The existing dataset is a CSVDataSet configured in the catalog.
Now I would like to create a Node, that enriches the CSVDataSet with data from the API, that I have to call for every row in the CSV file. Then save the data into a database (SQLTableDataSet). My approach is to create an APIDataSet entry in the catalog and provide it as an input for the node, next to the CSVDataSet.
The issue here is, the APIDataSet is static (in general the DataSets seem to be very static). I need to call the load function at runtime within the Node for every entry in the csv file.

I didn't find a way to do this. Is it just a bad approach? Do I have to call the API within the Node instead of creating a APIDataSet?


Solution

  • I have done this in my GDALRasterDataSet implementation. The idea is that if you need to enrich a dataset on the go, you can overload the load() method in a custom dataset and pass additional parameters there.

    You can see an implementation here and an example of usage here.

    The only extra thing you need to do is to re-write the load() method to accept kwargs (line 143) and write your own _load method that enriches your dataset. Everything else is boilerplate.