I have DataFrame in Python Pandas like below (ID and COL1 is "object" data type):
ID | COL1 |
---|---|
123 | XX |
123 | XX |
123 | L |
456 | AP |
456 | AP |
77 | C |
88 | LACK |
66 | LACK |
66 | G |
And I need to create new column "COL1_bis" where will be infromation how many unique values in "COL1" has each ID. But if some ID has value "LACK" in "COL1" then we do not count that
So as a result I need something like below:
ID | COL1 | COL1_bis |
---|---|---|
123 | XX | 2 |
123 | XX | 2 |
123 | L | 2 |
456 | AP | 1 |
456 | AP | 1 |
77 | C | 1 |
88 | LACK | 0 |
66 | LACK | 1 |
66 | GG | 1 |
Because:
How can I do that in Python Pandas?
Use Series.mask
for replace LACK
to missing values and then use GroupBy.transform
with DataFrameGroupBy.nunique
:
df['COL1_bis'] = (df['COL1'].mask(df['COL1'].eq('LACK'))
.groupby(df['ID'])
.transform('nunique'))
print (df)
ID COL1 COL1_bis
0 123 XX 2
1 123 XX 2
2 123 L 2
3 456 AP 1
4 456 AP 1
5 77 C 1
6 88 LACK 0
7 66 LACK 1
8 66 G 1