How to optimize the data frame memory footprint and find the most optimal (minimal) data types dtypes
for numeric columns. For example:
A B C D
0 1 1000000 1.1 1.111111
1 2 -1000000 2.1 2.111111
>>> df.dtypes
A int64
B int64
C float64
D float64
Expected result:
>>> df.dtypes
A int8
B int32
C float32
D float32
dtype: object
You can use parameter downcast
in to_numeric
with selectig integers and floats columns by DataFrame.select_dtypes
, it working from pandas 0.19+
like mentioned @anurag, thank you:
fcols = df.select_dtypes('float').columns
icols = df.select_dtypes('integer').columns
df[fcols] = df[fcols].apply(pd.to_numeric, downcast='float')
df[icols] = df[icols].apply(pd.to_numeric, downcast='integer')
print (df.dtypes)
A int8
B int32
C float32
D float32
dtype: object