Suppose I have three columns in a pandas DataFrame without any null or empty values.
Example of the data:
Column Vendor | Column Item | Column Facility |
---|---|---|
V1 | I1 | F1 |
V1 | I1 | F2 |
V2 | I1 | F4 |
V1 | I2 | F1 |
V1 | I2 | F2 |
V2 | I2 | F3 |
V3 | I2 | F5 |
V3 | I2 | F6 |
V3 | I3 | F3 |
V1 | I4 | F2 |
V4 | I4 | F4 |
V4 | I4 | F5 |
V1 | I5 | F1 |
V1 | I5 | F4 |
How do I create a DataFrame that contains the following?: "For every item, if there is more than one vendor associated with that item, give me all rows for that item (otherwise exclude all rows associated with that item)."
In this example, I would want a DataFrame of all rows for I1, I2, and I4 because they have more than 1 unique vendor associated with them, and I would exclude all rows involving I3 and I5 as there is only one vendor associated with them.
I tried to create a list of unique vendors for each item loop with a conditional of len(Appended_Vendor_List_var) > 1
, but so far no luck.
You can use a groupby.transform('nunique')
and boolean indexing:
out = df[df.groupby('Column Item')['Column Vendor'].transform('nunique').gt(1)]
Output:
Column Vendor Column Item Column Facility
0 V1 I1 F1
1 V1 I1 F2
2 V2 I1 F4
3 V1 I2 F1
4 V1 I2 F2
5 V2 I2 F3
6 V3 I2 F5
7 V3 I2 F6
9 V1 I4 F2
10 V4 I4 F4
11 V4 I4 F5
Intermediate:
Column Vendor Column Item Column Facility nunique nunique>1
0 V1 I1 F1 2 True
1 V1 I1 F2 2 True
2 V2 I1 F4 2 True
3 V1 I2 F1 3 True
4 V1 I2 F2 3 True
5 V2 I2 F3 3 True
6 V3 I2 F5 3 True
7 V3 I2 F6 3 True
8 V3 I3 F3 1 False
9 V1 I4 F2 2 True
10 V4 I4 F4 2 True
11 V4 I4 F5 2 True
12 V1 I5 F1 1 False
13 V1 I5 F4 1 False
Variant using drop_duplicates
and isin
:
drop = (df.drop_duplicates(['Column Vendor', 'Column Item'])['Column Item']
.drop_duplicates(keep=False)
)
out = df[~df['Column Item'].isin(drop)]