I am new to python and I am trying to insert record into group by result object.
I have below dataframe where ID 1 & 2 has SECTION_GROUP as GROUP 1 and 3 & 4 has GROUP 2 but 5 doesn't have any SECTION_GROUP.
ID | ENTITY_NAME | ENTITY_NAME | SECTION_GROUP | DOC_ID |
---|---|---|---|---|
1 | dNumber | U220059090 | GROUP 1 | 40 |
2 | tDate | 6-Dec-22 | GROUP 1 | 40 |
3 | dNumber | U220059090 | GROUP 2 | 40 |
4 | tDate | 6-Dec-22 | GROUP 2 | 40 |
5 | sCompany | bp | nan | 40 |
I am trying to get result as below into two separate group.
ID | ENTITY_NAME | ENTITY_NAME | SECTION_GROUP | DOC_ID |
---|---|---|---|---|
1 | dNumber | U220059090 | GROUP 1 | 40 |
2 | tDate | 6-Dec-22 | GROUP 1 | 40 |
5 | sCompany | bp | nan | 40 |
ID | ENTITY_NAME | ENTITY_NAME | SECTION_GROUP | DOC_ID |
---|---|---|---|---|
3 | dNumber | U220059090 | GROUP 2 | 40 |
4 | tDate | 6-Dec-22 | GROUP 2 | 40 |
5 | sCompany | bp | nan | 40 |
I have tried below but I am only getting result for GROUP 2. I need to access both groups result outside of loop. Any help is really appreciated.
import pandas as pd
df = pd.read_csv ('sample.csv',encoding= 'unicode_escape',usecols= ['ID','ENTITY_NAME','ENTITY_VALUE','SECTION_GROUP','DOC_ID'])
distDocIds = df["DOC_ID"].unique()
for docId in distDocIds:
result = df[df.DOC_ID==docId] # all data for specific Id
grpResult = df[df.DOC_ID==docId].groupby('SECTION_GROUP') # groupby SECTION_GROUP data
for group in grpResult:
#check in any record present without SECTION_GROUP
#if present append group with that record
foundUnion = result[pd.isnull(result.SECTION_GROUP)]
if len(foundUnion) > 0:
foundUnion = foundUnion.append(group[1])
**#IF I print foundUnion here I am getting proper result as epxected but I want this access foundUnion outside of loop.
newdf = foundUnion.copy()
print(newdf)
If the empty cell is empty string ''
, you can do:
mask = df['SECTION_GROUP'].eq('')
rest = df[mask]
for _, g in df[~mask].groupby('SECTION_GROUP'):
g = pd.concat([g, rest])
print(g)
Prints:
ID ENTITY_NAME ENTITY_NAME.1 SECTION_GROUP DOC_ID
0 1 dNumber U220059090 GROUP 1 40
1 2 tDate 6-Dec-22 GROUP 1 40
4 5 sCompany bp 40
ID ENTITY_NAME ENTITY_NAME.1 SECTION_GROUP DOC_ID
2 3 dNumber U220059090 GROUP 2 40
3 4 tDate 6-Dec-22 GROUP 2 40
4 5 sCompany bp 40
If the empty value is NaN
:
mask = df['SECTION_GROUP'].isna()
rest = df[mask]
for _, g in df[~mask].groupby('SECTION_GROUP'):
g = pd.concat([g, rest])
print(g)
Prints:
ID ENTITY_NAME ENTITY_NAME.1 SECTION_GROUP DOC_ID
0 1 dNumber U220059090 GROUP 1 40
1 2 tDate 6-Dec-22 GROUP 1 40
4 5 sCompany bp NaN 40
ID ENTITY_NAME ENTITY_NAME.1 SECTION_GROUP DOC_ID
2 3 dNumber U220059090 GROUP 2 40
3 4 tDate 6-Dec-22 GROUP 2 40
4 5 sCompany bp NaN 40