Search code examples
pandas

Pandas 'Usecols do not match columns, columns expected but not found'


Here is my data file:

enter image description here

My file is readable and has a list of columns

import pandas as pd

df = pd.read_csv('data.csv')
print(df.columns)

enter image description here

Why do I get an error when I want to read one column or several?

df2 = pd.read_csv('data.csv', usecols=['<TICKER>'])

enter image description here


Solution

  • Your CSV is invalid, you have double quotes " wrapping the full lines. Thus they are considered a single field.

    You should remove them before trying to read the file as CSV.

    Here is an example to pre-process the file to remove the external ":

    from io import StringIO
    import pandas as pd
    
    with open('data.csv') as csv:
        df = pd.read_csv(StringIO('\n'.join(l[:-1].strip('"') for l in csv)),
                         usecols=['<TICKER>'])
    

    Output:

      <TICKER>
    0     AFLT
    1     AFLT
    

    Another quick-and-dirty approach, assuming you don't have other quoted fields (i.e. only " on the outside of the lines), could be to consider the " as an extra separator:

    df = pd.read_csv('data.csv', sep=',|"', usecols=['<TICKER>'],
                     engine='python', quoting=3)