Search code examples
pythonpandasdataframegroup-by

Counting groups in pandas


I am trying to group the groups that share a common element in a Pandas dataframe, my dataframe looks like this (I added the empty lines just for some extra legibility):

ID       REG
15      01497
15      01493

19      01706
19      01706-A

78      05710
78      01738

143     01626
143     01634

144     01626
144     01644

207     05255
207     01638

209     05255
209     03143

I expect to create a new column where I can grop all the IDs that share a same REG number, so it would look like this:

ID     REG          GROUP
15     01497        1
15     01493        1

19     01706        2
19     01706-A      2

78     05710        3
78     01738        3

143    01626        4
143    01634        4

144    01626        4
144    01644        4

207    05255        5
207    01638        5

209    05255        5
209    03143        5

There are some questions that address similar problems like this, but they are not quite the same, (Perhaps Pandas DataFrame Groupby two columns and get counts or How do I find common values within groups in pandas? or Numbering Groups In Pandas DataFrame) but I am not aiming to perform a sum or count in the traditional way.


Solution

  • With df your dataframe you could try the following:

    df["GROUP"] = (
        df.groupby("ID", sort=False)["REG"].transform("first").to_frame()
        .groupby("REG", sort=False).ngroup().add(1)
    )
    
    • First group by column ID and pick the resp. first REG item for each ID-group.
    • Then map the so built new REG column onto group numbers.

    Result for the sample in the question:

         ID      REG  GROUP
    0    15    01497      1
    1    15    01493      1
    2    19    01706      2
    3    19  01706-A      2
    4    78    05710      3
    5    78    01738      3
    6   143    01626      4
    7   143    01634      4
    8   144    01626      4
    9   144    01644      4
    10  207    05255      5
    11  207    01638      5
    12  209    05255      5
    13  209    03143      5