Search code examples
pythonpandascsvkedro

How do I select which columns to load in a Kedro CSVLocalDataSet?


I have a csv file that looks like

a,b,c,d
1,2,3,4
5,6,7,8

and I want to load it in as a Kedro CSVLocalDataSet, but I don't want to read the entire file. I only want a few columns (say a and b for example).

Is there any way for me to specify the list of columns to read/load?


Solution

  • CSVLocalDataSet uses pandas.read_csv, which takes "usecols" parameter. It can easily be proxied by using load_args dataset parameter (all datasets support additional parameters passing via load_args and save_args):

    my_cool_data:
      type: CSVLocalDataSet
      filepath: data/path.csv
      load_args: 
        usecols: ['a', 'b']
    

    Also note the same parameters would work for any pandas-based dataset.