Search code examples
pythonpython-3.xpandasdataframegroup-by

appending new rows to a Pandas groupby result object


I am new to python and I am trying to insert record into group by result object.

I have below dataframe where ID 1 & 2 has SECTION_GROUP as GROUP 1 and 3 & 4 has GROUP 2 but 5 doesn't have any SECTION_GROUP.

ID ENTITY_NAME ENTITY_NAME SECTION_GROUP DOC_ID
1 dNumber U220059090 GROUP 1 40
2 tDate 6-Dec-22 GROUP 1 40
3 dNumber U220059090 GROUP 2 40
4 tDate 6-Dec-22 GROUP 2 40
5 sCompany bp nan 40

I am trying to get result as below into two separate group.

ID ENTITY_NAME ENTITY_NAME SECTION_GROUP DOC_ID
1 dNumber U220059090 GROUP 1 40
2 tDate 6-Dec-22 GROUP 1 40
5 sCompany bp nan 40
ID ENTITY_NAME ENTITY_NAME SECTION_GROUP DOC_ID
3 dNumber U220059090 GROUP 2 40
4 tDate 6-Dec-22 GROUP 2 40
5 sCompany bp nan 40

I have tried below but I am only getting result for GROUP 2. I need to access both groups result outside of loop. Any help is really appreciated.

import pandas as pd

df = pd.read_csv ('sample.csv',encoding= 'unicode_escape',usecols= ['ID','ENTITY_NAME','ENTITY_VALUE','SECTION_GROUP','DOC_ID'])
distDocIds = df["DOC_ID"].unique()


for docId in distDocIds:
    result = df[df.DOC_ID==docId] # all data for specific Id
    grpResult = df[df.DOC_ID==docId].groupby('SECTION_GROUP') # groupby SECTION_GROUP data
    
    for group in grpResult:
        #check in any record present without SECTION_GROUP
        #if present append group with that record
        foundUnion = result[pd.isnull(result.SECTION_GROUP)] 
        if len(foundUnion) > 0:
            foundUnion = foundUnion.append(group[1])
        **#IF I print foundUnion here I am getting proper result as epxected but I want this access foundUnion outside of loop.
        
newdf = foundUnion.copy()
print(newdf)

Solution

  • If the empty cell is empty string '', you can do:

    mask = df['SECTION_GROUP'].eq('')
    rest = df[mask]
    
    for _, g in df[~mask].groupby('SECTION_GROUP'):
        g = pd.concat([g, rest])
        print(g)
    

    Prints:

       ID ENTITY_NAME ENTITY_NAME.1 SECTION_GROUP  DOC_ID
    0   1     dNumber    U220059090       GROUP 1      40
    1   2       tDate      6-Dec-22       GROUP 1      40
    4   5    sCompany            bp                    40
    
       ID ENTITY_NAME ENTITY_NAME.1 SECTION_GROUP  DOC_ID
    2   3     dNumber    U220059090       GROUP 2      40
    3   4       tDate      6-Dec-22       GROUP 2      40
    4   5    sCompany            bp                    40
    

    If the empty value is NaN:

    mask = df['SECTION_GROUP'].isna()
    rest = df[mask]
    
    for _, g in df[~mask].groupby('SECTION_GROUP'):
        g = pd.concat([g, rest])
        print(g)
    

    Prints:

       ID ENTITY_NAME ENTITY_NAME.1 SECTION_GROUP  DOC_ID
    0   1     dNumber    U220059090       GROUP 1      40
    1   2       tDate      6-Dec-22       GROUP 1      40
    4   5    sCompany            bp           NaN      40
    
       ID ENTITY_NAME ENTITY_NAME.1 SECTION_GROUP  DOC_ID
    2   3     dNumber    U220059090       GROUP 2      40
    3   4       tDate      6-Dec-22       GROUP 2      40
    4   5    sCompany            bp           NaN      40