python python-3.x pandas dataframe group-by

merge group by data into single list or dataframe

Hello I am new to python and not sure how to achieve.

ID	ENTITY_NAME	ENTITY_NAME	SECTION_GROUP	DOC_ID
1	dNumber	U220059090	GROUP 1	40
2	tDate	6-Dec-22	GROUP 1	40
3	dNumber	U220059090	GROUP 2	40
4	tDate	6-Dec-22	GROUP 2	40
5	sCompany	bp	nan	40
6	dNumber	U220059090	GROUP 1	45
7	tDate	6-Dec-22	GROUP 1	45
8	dNumber	U220059090	GROUP 2	45
9	tDate	6-Dec-22	GROUP 2	45
10	sCompany	bp	nan	45

I have applied group by to section_group column on below data and got two different result set in dataframe as below.

import pandas as pd
import numpy as np

df = pd.read_csv ('dups_check_group_v1.csv',encoding= 'unicode_escape',usecols= ['ID','ENTITY_NAME','ENTITY_VALUE','SECTION_GROUP','DOC_ID'])

mask = df['SECTION_GROUP'].isna()
rest = df[mask]

for _, g in df[~mask].groupby('SECTION_GROUP'):
    g = pd.concat([g, rest])
    print(g)

Result

ID ENTITY_NAME   ENTITY_VALUE SECTION_GROUP  DOC_ID
0   1     dNumber  U220059090(C)       GROUP 1      40
1   2       tDate       6-Dec-22       GROUP 1      40
5   6     dNumber  U220059090(C)       GROUP 1      45
6   7       tDate       6-Dec-22       GROUP 1      45
4   5    sCompany             bp           NaN      40
9  10    sCompany             bp           NaN      45

ID ENTITY_NAME   ENTITY_VALUE SECTION_GROUP  DOC_ID
2   3     dNumber  U220059090(C)       GROUP 2      40
3   4       tDate       6-Dec-22       GROUP 2      40
7   8     dNumber  U220059090(C)       GROUP 2      45
8   9       tDate       6-Dec-22       GROUP 2      45
4   5    sCompany             bp           NaN      40
9  10    sCompany             bp           NaN      45

I am trying to merge this two results into single result. like below. any help will be really appreciate.

ID ENTITY_NAME   ENTITY_VALUE SECTION_GROUP  DOC_ID
    0   1     dNumber  U220059090(C)       GROUP 1      40
    1   2       tDate       6-Dec-22       GROUP 1      40
    5   6     dNumber  U220059090(C)       GROUP 1      45
    6   7       tDate       6-Dec-22       GROUP 1      45
    4   5    sCompany             bp           NaN      40
    9  10    sCompany             bp           NaN      45
    2   3     dNumber  U220059090(C)       GROUP 2      40
    3   4       tDate       6-Dec-22       GROUP 2      40
    7   8     dNumber  U220059090(C)       GROUP 2      45
    8   9       tDate       6-Dec-22       GROUP 2      45
    4   5    sCompany             bp           NaN      40
    9  10    sCompany             bp           NaN      45

Solution

Just use concat again:

out = pd.concat([pd.concat([g, rest]) for _, g in
                 df[~mask].groupby('SECTION_GROUP')])

Or rework your loop:

out = pd.concat([d for _, g in df[~mask].groupby('SECTION_GROUP')
                 for d in [g, rest]])

Output:

   ID ENTITY_NAME ENTITY_NAME.1 SECTION_GROUP  DOC_ID
0   1     dNumber    U220059090       GROUP 1      40
1   2       tDate      6-Dec-22       GROUP 1      40
5   6     dNumber    U220059090       GROUP 1      45
6   7       tDate      6-Dec-22       GROUP 1      45
4   5    sCompany            bp           NaN      40
9  10    sCompany            bp           NaN      45
2   3     dNumber    U220059090       GROUP 2      40
3   4       tDate      6-Dec-22       GROUP 2      40
7   8     dNumber    U220059090       GROUP 2      45
8   9       tDate      6-Dec-22       GROUP 2      45
4   5    sCompany            bp           NaN      40
9  10    sCompany            bp           NaN      45