I am trying to group the groups that share a common element in a Pandas dataframe, my dataframe looks like this (I added the empty lines just for some extra legibility):
ID REG
15 01497
15 01493
19 01706
19 01706-A
78 05710
78 01738
143 01626
143 01634
144 01626
144 01644
207 05255
207 01638
209 05255
209 03143
I expect to create a new column where I can grop all the IDs that share a same REG number, so it would look like this:
ID REG GROUP
15 01497 1
15 01493 1
19 01706 2
19 01706-A 2
78 05710 3
78 01738 3
143 01626 4
143 01634 4
144 01626 4
144 01644 4
207 05255 5
207 01638 5
209 05255 5
209 03143 5
There are some questions that address similar problems like this, but they are not quite the same, (Perhaps Pandas DataFrame Groupby two columns and get counts or How do I find common values within groups in pandas? or Numbering Groups In Pandas DataFrame) but I am not aiming to perform a sum or count in the traditional way.
With df
your dataframe you could try the following:
df["GROUP"] = (
df.groupby("ID", sort=False)["REG"].transform("first").to_frame()
.groupby("REG", sort=False).ngroup().add(1)
)
ID
and pick the resp. first REG
item for each ID
-group.REG
column onto group numbers.Result for the sample in the question:
ID REG GROUP
0 15 01497 1
1 15 01493 1
2 19 01706 2
3 19 01706-A 2
4 78 05710 3
5 78 01738 3
6 143 01626 4
7 143 01634 4
8 144 01626 4
9 144 01644 4
10 207 05255 5
11 207 01638 5
12 209 05255 5
13 209 03143 5