Search code examples
pythonpython-3.xpandasdataframegroup-by

create new column with incremental values


I have below result set which got populated with the following code. I require to add new Column GROUP_ID in this result.

import pandas as pd
import numpy as np

df = pd.read_csv ('dups_check_group_v1.csv',encoding= 'unicode_escape',usecols= ['ID','ENTITY_NAME','ENTITY_VALUE','SECTION_GROUP','DOC_ID'])

mask = df['SECTION_GROUP'].isna()
rest = df[mask]

out = pd.concat([d for _, g in df[~mask].groupby('SECTION_GROUP')
                 for d in [g, rest]])
print(out.sort_values('DOC_ID'))

Result

    ID ENTITY_NAME   ENTITY_VALUE SECTION_GROUP  DOC_ID
0   1     dNumber  U220059090(C)       GROUP 1      40
1   2       tDate       6-Dec-22       GROUP 1      40
4   5    sCompany             bp           NaN      40
2   3     dNumber  U220059090(C)       GROUP 2      40
3   4       tDate       6-Dec-22       GROUP 2      40
4   5    sCompany             bp           NaN      40
5   6     dNumber  U220059090(C)       GROUP 1      42
6   7       tDate       6-Dec-22       GROUP 1      42
9  10    sCompany             bp           NaN      42
7   8     dNumber  U220059090(C)       GROUP 2      42
8   9       tDate       6-Dec-22       GROUP 2      42
9  10    sCompany             bp           NaN      42

what I am looking to achieve with GROUP_ID is as below. any help is really appreciate.

enter image description here


Solution

  • Update your existing concat statement to assign group ids dynamically

    grps = df[~mask].groupby('SECTION_GROUP')
    out = pd.concat([d.assign(GROUP_ID=i) for i, (_, g) in enumerate(grps, 1) for d in [g, rest]])