Search code examples
pythonpandascsvreadfilecategorical-data

How to read categorical columns with pandas' read_csv?


I have tried passing the dtype parameter with read_csv as dtype={n: pandas.Categorical} but this does not work properly (the result is an Object). The manual is unclear.

Is it possible to read categorical columns with pd.read_csv?


Solution

  • In version 0.19.0 you can use parameter dtype='category' in read_csv:

    data = 'col1,col2,col3\na,b,1\na,b,2\nc,d,3'
    df = pd.read_csv(pd.compat.StringIO(data), dtype='category')
    print (df)
      col1 col2 col3
    0    a    b    1
    1    a    b    2
    2    c    d    3
    
    print (df.dtypes)
    col1    category
    col2    category
    col3    category
    dtype: object
    

    If want specify column for category use dtype with dictionary:

    df = pd.read_csv(pd.compat.StringIO(data), dtype={'col1':'category'})
    print (df)
      col1 col2  col3
    0    a    b     1
    1    a    b     2
    2    c    d     3
    
    print (df.dtypes)
    col1    category
    col2      object
    col3       int64
    dtype: object