Search code examples
pythonpandasmethod-chainingin-place

pandas chaining and the use of "inplace" parameter


For pandas DataFrames in python, multiple member methods have an inplace parameter which purportedly allow you to NOT create a copy of the object, but rather to directly modify the original object*.

[*Edited to add: however, this proves to not be the case as pointed out by @juanpa.arrivillaga. inplace=True DOES copy data and merely updates a pointer associated with the modified object, so has few advantages over a manual re-assignment to the name of the original object.]

Examples that I have seen online for the use of inplace=True do not include examples where chaining is used. This comment in a related SO thread may be an answer to why I don't see such examples anywhere:

you can't method chain and operate in-place. in-place ops return None and break the chain

But, would "inplace chaining" work if you put an inplace=True in the last entry in the chain? [Edited to add: no] Or would that be equivalent to trying to change a copy created in an earlier link in the chain, which, as it is no longer your original object, is "lost" after the chain statement is complete? [Edited to add: yes; see answer here]

The use of large data objects would seem to preclude the notion of chaining without the ability to do so in-place, at least insofar as desire to maintain a low memory overhead and high computational speed. Is there an alternate implementation of pandas or, e.g. an equivalent of R's data.table available in python that might be appropriate for my needs? Or are my only options to not chain (and compute quickly) or to chain but make redundant copies of the data, at least transiently?


Solution

  • Let's try it.

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'value' : [2, 2, 1, 1, 3, 4, 5, np.NaN]})
    
    df.sort_values('value').drop_duplicates().dropna(inplace=True)
    

    Expect:

       value
    2    1.0
    0    2.0
    4    3.0
    5    4.0
    6    5.0
    

    Result:

       value
    0    2.0
    1    2.0
    2    1.0
    3    1.0
    4    3.0
    5    4.0
    6    5.0
    7    NaN
    

    Answer: No, inplace=True at the end of the chain does not modify the original dataframe.