Search code examples
pythonpandaspandas-groupby

Autoincrement indexing after groupby with pandas on the original table


I cannot solve a very easy/simple problem in pandas. :(

I have the following table:

df = pd.DataFrame(data=dict(a=[1, 1, 1,2, 2, 3,1], b=["A", "A","B","A", "B", "A","A"]))
df

Out[96]: 
   a  b
0  1  A
1  1  A
2  1  B
3  2  A
4  2  B
5  3  A
6  1  A

I would like to make an incrementing ID of each grouped (grouped by columns a and b) unique item. So the result would like like this (column c):

Out[98]: 
   a  b  c
0  1  A  1
1  1  A  1
2  1  B  2
3  2  A  3
4  2  B  4
5  3  A  5
6  1  A  1

I tried with:

df.groupby(["a", "b"]).nunique().cumsum().reset_index()

Result:

Out[105]: 
   a  b  c
0  1  A  1
1  1  B  2
2  2  A  3
3  2  B  4
4  3  A  5

Unfortunatelly this works only for the grouped by dataset and not on the original dataset. As you can see in the original table I have 7 rows and the grouped by returns only 5.

So could someone please help me on how to get the desired table:

   a  b  c
0  1  A  1
1  1  A  1
2  1  B  2
3  2  A  3
4  2  B  4
5  3  A  5
6  1  A  1

Thank you in advance!


Solution

  • groupby + ngroup

    df['c'] = df.groupby(['a', 'b']).ngroup() + 1
    

       a  b  c
    0  1  A  1
    1  1  A  1
    2  1  B  2
    3  2  A  3
    4  2  B  4
    5  3  A  5
    6  1  A  1