I have a dataset like mentioned below:
df=data.frame(Supplier_id=c("1","2","7","7","7","4","5","8","12","7"), Supplier=c("Tian","Yan","Goldy","Goldy","Goldy","Amy","Lauren","Cassy","Shaan","Goldy"),Date=c("1/17/2019","4/30/2019","11/29/2018","11/29/2018","11/29/2018","5/21/2018","5/23/2018","5/24/2018","6/15/2018","6/20/2018"),Buyer=c("Unclassified","Unclassified","Kelly","Kelly","Kelly","Kelly","Amanda","Echo","Shao","Shao"))
df$Supplier_id=as.numeric(as.character(df$Supplier_id))
Thus, df appears like below:
| Supplier_id | Supplier | Date | Buyer |
|-------------|----------|------------|--------------|
| 1 | Tian | 1/17/2019 | Unclassified |
| 2 | Yan | 4/30/2019 | Unclassified |
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
| 4 | Amy | 5/21/2018 | Kelly |
| 5 | Lauren | 5/23/2018 | Amanda |
| 8 | Cassy | 5/24/2018 | Echo |
| 12 | Shaan | 6/15/2018 | Shao |
| 7 | Goldy | 6/20/2018 | Shao |
Now, I want to filter out the Supplier_id's that occur only once for each unique Buyer. For example, in the above dataset, Supplier_id '1' and '2' belong to 'unclassified' buyer, but because they have different ids, I do not want them in my final output. However, when we look at the buyer 'Kelly', it has two supplier_ids, '7' and '4', where, '7' is occurring 3 times and '4' only once. So, the output table should have the record with supplier_id='7'. The grouping should be based on 'Buyer'. So it is important to note that since the supplier_id '7' exists for both 'Kelly' and 'Shao', but it should be grouped differently for both these buyers and not considered together.
The expected output should be:
| Supplier_id | Supplier | Date | Buyer_id |
|-------------|:--------:|-----------:|----------|
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
I have tried using group_by and filter but this would not work because there will be distinct supplier_id's for every buyer.I have also tried using duplicate but not sure how can I group the supplier_id for each buyer.
df <-df %>% group_by(Buyer) %>% filter(Supplier_id>1)
and also this
df2=df[duplicated(df[1]) | duplicated(df[1], fromLast=TRUE),]
EDIT: The original dataset has many such instances and there are n occurrences of different supplier_id for each buyer. What could be other way to get the desired output?
I think you need -
df %>% group_by(Supplier_id, Buyer) %>% filter(n() > 1)