A bit confused about inplace argument. In particular if there is any benefit in using it vs the standard approach of just writing df=df to clarify that we change the dataframe we are working with?
There is a big difference, if several names are pointing to the same object. df = df.do_something()
does not work in place, it just reassigns the copy to the same name.
Here is an example demonstrating the difference.
Reassigning df
, df2
is unchanged, it still points to the original object and df
to the new output.
df = pd.DataFrame({'col1': [1, float('nan'), 3]})
df2 = df
df = df.dropna()
print(df2)
col1
0 1.0
1 NaN
2 3.0
With inplace=True
the object is modified in place.
df = pd.DataFrame({'col1': [1, float('nan'), 3]})
df2 = df
df.dropna(inplace=True)
print(df2)
col1
0 1.0
2 3.0
In real life, I don't really see a use case to keep multiple reference to the same object. Given pandas complexity, it's sometimes hard to keep track of views and copies. I would recommend to avoid inplace=True
. In fact, there is a discussion to remove this parameter for most functions.