For pandas DataFrames in python, multiple member methods have an inplace
parameter which purportedly allow you to NOT create a copy of the object, but rather to directly modify the original object*.
[*Edited to add: however, this proves to not be the case as pointed out by @juanpa.arrivillaga. inplace=True
DOES copy data and merely updates a pointer associated with the modified object, so has few advantages over a manual re-assignment to the name of the original object.]
Examples that I have seen online for the use of inplace=True
do not include examples where chaining is used. This comment in a related SO thread may be an answer to why I don't see such examples anywhere:
you can't method chain and operate in-place. in-place ops return None and break the chain
But, would "inplace chaining" work if you put an inplace=True
in the last entry in the chain? [Edited to add: no] Or would that be equivalent to trying to change a copy created in an earlier link in the chain, which, as it is no longer your original object, is "lost" after the chain statement is complete? [Edited to add: yes; see answer here]
The use of large data objects would seem to preclude the notion of chaining without the ability to do so in-place, at least insofar as desire to maintain a low memory overhead and high computational speed. Is there an alternate implementation of pandas or, e.g. an equivalent of R's data.table available in python that might be appropriate for my needs? Or are my only options to not chain (and compute quickly) or to chain but make redundant copies of the data, at least transiently?
Let's try it.
import pandas as pd
import numpy as np
df = pd.DataFrame({'value' : [2, 2, 1, 1, 3, 4, 5, np.NaN]})
df.sort_values('value').drop_duplicates().dropna(inplace=True)
Expect:
value
2 1.0
0 2.0
4 3.0
5 4.0
6 5.0
Result:
value
0 2.0
1 2.0
2 1.0
3 1.0
4 3.0
5 4.0
6 5.0
7 NaN
Answer: No, inplace=True
at the end of the chain does not modify the original dataframe.