With Pandas groupby, I can do things like this:
>>> df = pd.DataFrame(
... {
... "A": ["foo", "bar", "bar", "foo", "bar"],
... "B": ["one", "two", "three", "four", "five"],
... }
... )
>>> print(df)
A B
0 foo one
1 bar two
2 bar three
3 foo four
4 bar five
>>> print(df.groupby('A')['B'].unique())
A
bar [two, three, five]
foo [one, four]
Name: B, dtype: object
What I am looking for is output that produces a list of indices instead of a list of column B:
A
bar [1, 2, 4]
foo [0, 3]
However, groupby('A').index.unique() doesn't work. What syntax would provide me the output I'm after? I'd be more than happy to do this in some other way than with groupby, although I do need to group by two columns in my real application.
You do not necessarily need to have a label in groupby
, you can use a grouping object.
This enables things like:
df.index.to_series().groupby(df['A']).unique()
output:
A
bar [1, 2, 4]
foo [0, 3]
dtype: object
df[~df[['A', 'B']].duplicated()].index.to_series().groupby(df['A']).unique()