Search code examples
pythonpandaspandas-groupby

Include indices in Pandas groupby results


With Pandas groupby, I can do things like this:

>>> df = pd.DataFrame(
...     {
...         "A": ["foo", "bar", "bar", "foo", "bar"],
...         "B": ["one", "two", "three", "four", "five"],
...     }
... )
>>> print(df)
     A      B
0  foo    one
1  bar    two
2  bar  three
3  foo   four
4  bar   five
>>> print(df.groupby('A')['B'].unique())
A
bar    [two, three, five]
foo           [one, four]
Name: B, dtype: object

What I am looking for is output that produces a list of indices instead of a list of column B:

A
bar    [1, 2, 4]
foo    [0, 3]

However, groupby('A').index.unique() doesn't work. What syntax would provide me the output I'm after? I'd be more than happy to do this in some other way than with groupby, although I do need to group by two columns in my real application.


Solution

  • You do not necessarily need to have a label in groupby, you can use a grouping object.

    This enables things like:

    df.index.to_series().groupby(df['A']).unique()
    

    output:

    A
    bar    [1, 2, 4]
    foo       [0, 3]
    dtype: object
    
    getting the indices of the unique B values:
    df[~df[['A', 'B']].duplicated()].index.to_series().groupby(df['A']).unique()