I have below result set which got populated with the following code. I require to add new Column GROUP_ID in this result.
import pandas as pd
import numpy as np
df = pd.read_csv ('dups_check_group_v1.csv',encoding= 'unicode_escape',usecols= ['ID','ENTITY_NAME','ENTITY_VALUE','SECTION_GROUP','DOC_ID'])
mask = df['SECTION_GROUP'].isna()
rest = df[mask]
out = pd.concat([d for _, g in df[~mask].groupby('SECTION_GROUP')
for d in [g, rest]])
print(out.sort_values('DOC_ID'))
Result
ID ENTITY_NAME ENTITY_VALUE SECTION_GROUP DOC_ID
0 1 dNumber U220059090(C) GROUP 1 40
1 2 tDate 6-Dec-22 GROUP 1 40
4 5 sCompany bp NaN 40
2 3 dNumber U220059090(C) GROUP 2 40
3 4 tDate 6-Dec-22 GROUP 2 40
4 5 sCompany bp NaN 40
5 6 dNumber U220059090(C) GROUP 1 42
6 7 tDate 6-Dec-22 GROUP 1 42
9 10 sCompany bp NaN 42
7 8 dNumber U220059090(C) GROUP 2 42
8 9 tDate 6-Dec-22 GROUP 2 42
9 10 sCompany bp NaN 42
what I am looking to achieve with GROUP_ID is as below. any help is really appreciate.
Update your existing concat statement to assign group ids dynamically
grps = df[~mask].groupby('SECTION_GROUP')
out = pd.concat([d.assign(GROUP_ID=i) for i, (_, g) in enumerate(grps, 1) for d in [g, rest]])