Search code examples
pythondata-sciencefeature-engineering

Is there another way to replace values like zero and one with NaNs?


i'm trying to replace integer values like -1 and 0 with NaN values. Here is the code:

df = df.replace(0, np.nan)
df = df.replace(-1, np.nan)

However the dataframe is large:

 <class 'pandas.core.frame.DataFrame'>
 RangeIndex: 891221 entries, 0 to 891220
 Columns: 366 entries, LNR to ALTERSKATEGORIE_GROB
 dtypes: float64(267), int64(93), object(6)
 memory usage: 2.4+ GB

and it takes a lot of time when i run it.

Is there any faster alternative to this code?


Solution

  • You are recreating the dataframe, and assigning df to a new object, and you are doing that twice! Why not do the replace in-line like this?

    df.replace([0,-1], np.nan, inplace=True)
    

    inplace is, by default, False. For more info and examples on replace, check the docs.

    Here is a skeleton code that shows that you achieve a speedup factor of almost 2 doing it this way compared to the way you were.