I have a csv
file that looks like
a,b,c,d
1,2,3,4
5,6,7,8
and I want to load it in as a Kedro CSVLocalDataSet
, but I don't want to read the entire file. I only want a few columns (say a
and b
for example).
Is there any way for me to specify the list of columns to read/load?
CSVLocalDataSet uses pandas.read_csv, which takes "usecols" parameter. It can easily be proxied by using load_args
dataset parameter (all datasets support additional parameters passing via load_args
and save_args
):
my_cool_data:
type: CSVLocalDataSet
filepath: data/path.csv
load_args:
usecols: ['a', 'b']
Also note the same parameters would work for any pandas-based dataset.