I have a dataframe whereby I need to check whether each group contains a full complement of excepted values.
I have tried grouping by Patient ID
and using transform
to check that each group has all 6 required value(s) Y
in Inclusion
being: -3, -2, -1, 1, 2, 3 otherwise marked as N
Using all
in a lambda doesn't quite get me there
Original df
Patient ID | Surg ID | Surg Type | Surg Date | Medical Appt Date | Inclusion |
---|---|---|---|---|---|
1 | 1 | X | 2022-09-03 | 2022-03-04 | -3 |
1 | 1 | X | 2022-09-03 | 2022-05-04 | -2 |
1 | 1 | X | 2022-09-03 | 2022-06-04 | -1 |
1 | 1 | X | 2022-09-03 | 2022-11-04 | 1 |
1 | 1 | X | 2022-09-03 | 2022-11-29 | 2 |
1 | 1 | X | 2022-09-03 | 2023-01-02 | 3 |
2 | 1 | X | 2022-10-03 | 2022-09-01 | -2 |
2 | 1 | X | 2022-10-03 | 2022-09-11 | -1 |
2 | 1 | X | 2022-10-03 | 2022-10-12 | 1 |
2 | 1 | X | 2022-10-03 | 2022-10-29 | 2 |
2 | 1 | X | 2022-10-03 | 2022-12-12 | 3 |
Desired df
Patient ID | Surg ID | Surg Type | Surg Date | Medical Appt Date | Inclusion | IncludeYN |
---|---|---|---|---|---|---|
1 | 1 | X | 2022-09-03 | 2022-03-04 | -3 | Y |
1 | 1 | X | 2022-09-03 | 2022-05-04 | -2 | Y |
1 | 1 | X | 2022-09-03 | 2022-06-04 | -1 | Y |
1 | 1 | X | 2022-09-03 | 2022-11-04 | 1 | Y |
1 | 1 | X | 2022-09-03 | 2022-11-29 | 2 | Y |
1 | 1 | X | 2022-09-03 | 2023-01-02 | 3 | Y |
2 | 1 | X | 2022-10-03 | 2022-09-01 | -2 | N |
2 | 1 | X | 2022-10-03 | 2022-09-11 | -1 | N |
2 | 1 | X | 2022-10-03 | 2022-10-12 | 1 | N |
2 | 1 | X | 2022-10-03 | 2022-10-29 | 2 | N |
2 | 1 | X | 2022-10-03 | 2022-12-12 | 3 | N |
If you necessarily have -1 before -2, 1 before 2, etc. (and no other values than those, like comments) then a simple groupby.transform
with nunique
and checking that you have all 6
values:
df['IncludeYN'] = np.where(df.groupby(['Patient ID', 'Surg ID'])
['Inclusion'].transform('nunique').eq(6),
'Y', 'N')
If you want to compare to a specific set of values with set.issubset
and ignore possible external values (like "Excluded"
from your other question):
target = {-3, -2, -1, 1, 2, 3}
df['IncludeYN'] = np.where(df.groupby(['Patient ID', 'Surg ID'])
['Inclusion'].transform(lambda x: target.issubset(x)),
'Y', 'N')
Output:
Patient ID Surg ID Surg Type Surg Date Medical Appt Date Inclusion IncludeYN
0 1 1 X 2022-09-03 2022-03-04 -3 Y
1 1 1 X 2022-09-03 2022-05-04 -2 Y
2 1 1 X 2022-09-03 2022-06-04 -1 Y
3 1 1 X 2022-09-03 2022-11-04 1 Y
4 1 1 X 2022-09-03 2022-11-29 2 Y
5 1 1 X 2022-09-03 2023-01-02 3 Y
6 2 1 X 2022-10-03 2022-09-01 -2 N
7 2 1 X 2022-10-03 2022-09-11 -1 N
8 2 1 X 2022-10-03 2022-10-12 1 N
9 2 1 X 2022-10-03 2022-10-29 2 N
10 2 1 X 2022-10-03 2022-12-12 3 N