Search code examples
pythonpython-3.xpandasclasssubclassing

How do I avoid subclassing a pandas DataFrame using composition?


The pandas documentation recommends against sub-classing their data structures. One of their recommended alternatives is to use composition, but they just point readers to a Wikipedia article on composition vs. inheritance. That article and other resources I've found have not helped me understand how to extend a pandas DataFrame using composition. Can someone explain composition in this context and tell me about cases where composition might be a preferred alternative to sub-classing pd.DataFrame? A simple example or a link to information that's more instructive than Wikipedia articles would be very helpful.

In this question I'm specifically asking how composition should be used in cases where someone might be tempted to subclass pd.DataFrame. I understand there are other solutions to extending a Python object that do not involve composition, and I asked another question about extending pandas DataFrames that resulted in a different solution using a wrapper class.


I didn't understand that "wrapping" and "composition" refer to the same approach here, as noted in MaxYarmolinsky's answer below. The answer to the question I linked to above has a more complete discussion about using composition in this case, which may require handling __getattr__, __getitem__, and __setitem__ properly (I realize this is obvious to people who know what they're doing, but I had to ask my previous question because I had failed to get/set items when I tried on my own).


Solution

  • Just some googling show you how to create a simple class as you describe through composition.

      class mydataframe():
          def __init__(self,data):
              self.coredataframe = pd.DataFrame(data)
              self.otherattribute = None
    

    Then you can add methods and attributes of your own...