Search code examples
pythonpandasdataframenumericdtype

How to downcast numeric columns in Pandas?


How to optimize the data frame memory footprint and find the most optimal (minimal) data types dtypes for numeric columns. For example:

   A        B    C         D
0  1  1000000  1.1  1.111111
1  2 -1000000  2.1  2.111111

>>> df.dtypes
A      int64
B      int64
C    float64
D    float64

Expected result:

>>> df.dtypes
A       int8
B      int32
C    float32
D    float32
dtype: object

Solution

  • You can use parameter downcast in to_numeric with selectig integers and floats columns by DataFrame.select_dtypes, it working from pandas 0.19+ like mentioned @anurag, thank you:

    fcols = df.select_dtypes('float').columns
    icols = df.select_dtypes('integer').columns
    
    df[fcols] = df[fcols].apply(pd.to_numeric, downcast='float')
    df[icols] = df[icols].apply(pd.to_numeric, downcast='integer')
    
    print (df.dtypes)
    A       int8
    B      int32
    C    float32
    D    float32
    dtype: object