Search code examples
pythonpandassampling

Error "Cannot access callable attribute 'sample' of 'DataFrameGroupBy' objects, try using the 'apply' method"


Mock data:

df = pd.DataFrame({
        'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        'country': ['USA', 'USA', 'USA', 'USA', 'USA', 'Canada', 'Canada', 'Canada', 'USA', 'Canada']
})

Let's say I want to sample one observation for each country:

df.groupby('country').sample(1)

I get this error:

AttributeError: Cannot access callable attribute 'sample' of 'DataFrameGroupBy' objects, try using the 'apply' method

I have tried to reset the index, it didn't solve the problem. I have also tried the answer here, it didn't work. What am I doing wrong?

EDIT: this question has a follow up here.


Solution

  • As the per the error use apply(). group_keys=False will remove the additional index of country.

    >>> df.groupby('country', group_keys=False).apply(lambda df: df.sample(1))
       id country
    6   7  Canada
    2   3     USA
    

    Edit: Seems to be a mismatch of Pandas versions as groupby was introduced in version 1.1.0. I ran the OPs code and it works as well.

    You will need to upgrade pandas using pip3 install --upgrade pandas