Search code examples
pythonpandastypes

pandas read_csv set `dtype` by column index (not name)


file.txt has a header and four columns. But the headers changes all the time.

something like:

,'non_standard_header_1','non_standard_header_2','non_standard_header_3'
,kdfjlkjdf, sdfdfd,,
,kdfjlkjwwdf, sdfddffd,,
,kdfjlkjwwdf,, sdfddffd,

I want to import file.txt in pandas, and I want the columns to be import as a object. The intuitive approach (to me):

dtype = [object, object, object] as in:

    daily_file              = pandas.read_csv('file.txt',
                                              usecols      = [1, 2, 3],
                                              dtype        = [object, object, object])

does not work, running the above, I get:

data type not understood

How to set column dtype on import w/o referencing (existing) column names?


Solution

  • pd.read_csv(..., dtype=object) will globally apply the object dtype across all columns read in, if that's what you're looking for.

    Otherwise, you'll need to pass a dict of the form {'col' : dtype} if you want to map dtypes to column names.