Search code examples
pythonpython-3.xpandasdataframerepresentation

Object representation in Pandas.DataFrame


Assume I have the following class, 'MyClass'.

class MyClass:
    def __repr__(self):
        return 'Myclass()'

    def __str__(self):
        return 'Meh'

instances = [MyClass() for i in range(5)]

Some instances are created and stored in the instances variable. Now, we check its content.

>>> instances
[Myclass(), Myclass(), Myclass(), Myclass(), Myclass()]

To represent the object python calls the __repr__ method. However, when the same instances variable is passed to a pandas.DataFrame, the representation of the object changes and the __str__ method seemed to be called.

import pandas as pd

df = pd.DataFrame(data=instances)
>>> df
     0
0  Meh
1  Meh
2  Meh
3  Meh
4  Meh

Why has the object's representation changed? Can I determine which representation is used in the DataFrame?


Solution

  • The data is indeed stored as object. It seems pandas just calls the __str__ method (implicitly) when it displays the dataframe.

    You can verify that by calling:

    df[0].map(type)
    

    It calls type for each element in the column and returns:

    Out[572]: 
    0    <class '__main__.MyClass'>
    1    <class '__main__.MyClass'>
    2    <class '__main__.MyClass'>
    3    <class '__main__.MyClass'>
    4    <class '__main__.MyClass'>
    Name: 0, dtype: object
    
    # likewise you get the the
    # representation string of the objects
    # with:
    df[0].map(repr)
    Out[578]: 
    0    Myclass()
    1    Myclass()
    2    Myclass()
    3    Myclass()
    4    Myclass()
    Name: my_instances, dtype: object
    

    Btw, if you want to create a dataframe with a column that contains the data explicitly, rather use:

    df = pd.DataFrame({'my_instances': instances})
    

    This way, you assign a column name.