Any situation where I would want to use inplace=True vs df=df?

A bit confused about inplace argument. In particular if there is any benefit in using it vs the standard approach of just writing df=df to clarify that we change the dataframe we are working with?

Solution

There is a big difference, if several names are pointing to the same object. df = df.do_something() does not work in place, it just reassigns the copy to the same name.

Here is an example demonstrating the difference.

Reassigning df, df2 is unchanged, it still points to the original object and df to the new output.

df = pd.DataFrame({'col1': [1, float('nan'), 3]})
df2 = df

df = df.dropna()
print(df2)

   col1
0   1.0
1   NaN
2   3.0

With inplace=True the object is modified in place.

df = pd.DataFrame({'col1': [1, float('nan'), 3]})
df2 = df

df.dropna(inplace=True)
print(df2)

   col1
0   1.0
2   3.0

In real life, I don't really see a use case to keep multiple reference to the same object. Given pandas complexity, it's sometimes hard to keep track of views and copies. I would recommend to avoid inplace=True. In fact, there is a discussion to remove this parameter for most functions.