Search code examples
pythonpandasdataframenumpyaccessor

Is it possible to write an accessor for pandas GroupBy objects?


I am wondering if it is possible to implement pandas api accessor (as https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.extensions.register_dataframe_accessor.html#pandas.api.extensions.register_dataframe_accessor) for GroupBy objects.

Using the following code, I can apply the accessor to the group items:

import pandas as pd
import numpy as np

@pd.api.extensions.register_dataframe_accessor("geo")
class GeoAccessor:
    def __init__(self, pandas_obj):
        self._obj = pandas_obj

    @property
    def center(self):
        # return the geographic center point of this DataFrame
        lat = self._obj.latitude
        lon = self._obj.longitude
        return (float(lon.mean()), float(lat.mean()))


if __name__ == "__main__":
    ds = pd.DataFrame({"longitude": np.linspace(0, 10),
                       "latitude": np.linspace(0, 20)})
    ds['grp'] = ds['longitude'].astype(int)
    for g in ds.groupby(by='grp'):
        print(g[1].geo.center)

which results in

(0.40816326530612246, 0.8163265306122449)
(1.4285714285714286, 2.857142857142857)
(2.4489795918367347, 4.8979591836734695)
(3.4693877551020407, 6.938775510204081)
(4.4897959183673475, 8.979591836734695)
(5.510204081632653, 11.020408163265307)
(6.530612244897959, 13.061224489795919)
(7.551020408163266, 15.102040816326532)
(8.571428571428573, 17.142857142857146)
(9.489795918367347, 18.979591836734695)
(10.0, 20.0)

Now, how could I do this directly using a syntax similar to:

ds.groupby('grp').geo.center

The error message I get for this is

ds.groupby(by='grp').geo.center
Traceback (most recent call last):

  File "C:\.../ipykernel_11200/2937951017.py", line 1, in <module>
    ds.groupby(by='grp').geo.center

  File "C:\...\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 911, in __getattr__
    raise AttributeError(

AttributeError: 'DataFrameGroupBy' object has no attribute 'geo'

Solution

  • Are you essentially wanting to do ds.groupby('grp').apply (lambda d: d.geo.center)?

    It might be possible to implement this as an accessor, but you'd have to borrow the source code for the CachedAccessor and _register_accessor in pandas, and then define your accessor object then add it to the groupby class using _register_accessor. See this as an example. https://github.com/staircase-dev/piso/blob/master/piso/accessor.py

    Your accessor object will have a reference to the Groupby object it is attached to. You'd want to define the center property which just returns the result of .apply(lambda d: d.geo.center) on the Groupby object. This is a lot of work for what seems to be syntactic sugar though.