Search code examples
pythonpandasgetter-setter

Which Python Magic Method is Pandas Using?


I have a class that contains a pandas DataFrame (self.my_df), and updating self.my_df does not work how I expect it to. Here's a simplified version of the code that illustrates my problem:

class my_obj(object):
    @property
    def my_df(self):
        if not hasattr(self, "_my_df"):
            self._my_df = pandas.DataFrame({ "A" : [1,2,3,],
                                             "B" : [4,5,6]}).fillna("")
        print("Retrieving!")
        return self._my_df

    @my_df.setter
    def my_df(self, my_new_df):
        print("Setting!")
        self._my_df = my_new_df.copy()

Here's what happens when I (try to) call these methods (from inside a separate instance method that I don't think matters here):

ipdb> self.my_df
Retrieving!
   A  B
0  1  4
1  2  5
2  3  6
ipdb> self.my_df.loc[2, "B"] = "x"
Retrieving!
ipdb> self.my_df
Retrieving!
   A  B
0  1  4
1  2  5
2  3  x
ipdb> self._my_df
   A  B
0  1  4
1  2  5
2  3  x

I would expect self.my_df.loc[2, "B"] = "x" to call the setter, which it doesn't, or——if it doesn't——then I would expect self._my_df not to be set, which it is.

What's happening here? My real situation is much more complex, but I believe this is the root confusion for me.

Thanks for helping me clear this up.


Solution

  • It's easier to see what's happening if you break down the steps. Instead of

    self.my_df.loc[2, "B"] = "x"
    

    consider

    temp = self.my_df         # Clearly this should call the get method
    temp.loc[2, "B"] = "x"    # Changes the pandas object
    

    These two snippets achieve the same result. The setter will not be called, since you are not assigning to the my_df property of the my_obj object. You are retrieving the contents of self.my_df (which is a dataframe), and then manipulating it.

    A my_obj object only holds a reference to a DataFrame, so unless you point my_df to a different object, the setter will not be called. With your code, the my_obj object still points to the same dataframe, but you have manipulated the dataframe's contents.