i'm trying to replace integer values like -1 and 0 with NaN values. Here is the code:
df = df.replace(0, np.nan)
df = df.replace(-1, np.nan)
However the dataframe is large:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891221 entries, 0 to 891220
Columns: 366 entries, LNR to ALTERSKATEGORIE_GROB
dtypes: float64(267), int64(93), object(6)
memory usage: 2.4+ GB
and it takes a lot of time when i run it.
Is there any faster alternative to this code?
You are recreating the dataframe, and assigning df
to a new object, and you are doing that twice! Why not do the replace in-line like this?
df.replace([0,-1], np.nan, inplace=True)
inplace
is, by default, False
. For more info and examples on replace
, check the docs.
Here is a skeleton code that shows that you achieve a speedup factor of almost 2 doing it this way compared to the way you were.