Search code examples
pythonpandasfunctionvariable-assignmentin-place

pandas.DataFrame: difference between inplace = True and assigning the same variable?


I am replacing -np.inf and np.inf with np.nan within a pandas data frame.

However, using the inplace = True, I get a warning:

df.replace([np.inf, -np.inf], np.nan, inplace = True)

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

whereas just re-assigning the variable to itself (not sure whether this is a smart idea), but that seems to "solve" the issue:

df = df.replace([np.inf, -np.inf], np.nan)

I have read in another question that python distinguish between a copy and a view, and if it's not clear somehow modifying one variable could affect the other too.

Should restrain from making use of inplace?

Just as background: I have a data frame with stock prices, however with missing value. I have a function that uses this data frame, but "cleans" up the data before processing.

def func(df):
   df_aux = df.dropna(axis = 1)
   df_aux.replace([np.inf, -np.inf], np.nan, inplace = True)
   df_aux.fillna(method = 'ffill', inplace = True)

   some calculation with df_aux

   return x

Solution

  • You need copy:

    df_aux = df.dropna(axis = 1).copy()
    

    If you modify values in df_aux later you will find that the modifications do not propagate back to the original data df, and that Pandas does warning.