Search code examples
pipelinekedro

How can we get the pipeline to read columns with special characters?


I am using the "usecols" parameter to get some columns of a .xlsx file (I am using the xls_local.py file from the Kedro tutorial) but the program says that "usecols do not match columns, columns expected but not found:" and it only shows the columns that have special characters. How can I fix this, please? Thank you very much for your attention.


Solution

  • As far as I can tell, this isn't a kedro issue, but a pandas.read_excel issue, which is what kedro uses under the hood. This seems to be broken in pandas itself, and a workaround is to reference the columns using letters instead, so something like usecols='A:D' and then you can rename the columns to what they should be by doing df.columns = ["colname with special characters", "b", "c", "d"] for example.