I have a dataframe:
UserId <- c("A", "A", "A", "B", "B", "B")
SellerId <- c("X", "X", "Y", "Y", "Z", "Z")
Product <- c("ball", "ball", "ball", "ball", "doll", "doll")
SalesDate <- c("2022-01-01", "2022-01-01", "2022-01-02", "2022-01-04", "2022-01-06", "2022-01-07")
sales <- data.frame(UserId, SellerId, Product, SalesDate)
And I want to find sales for which:
I've been thinking for a long time how to even use one of these criteria and nothing comes to mind. The table I should be left with in this case is:
UserId | SellerId | Product | SalesDate |
---|---|---|---|
A | X | ball | 2022-01-01 |
A | X | ball | 2022-01-01 |
UserId is the same, seller is the same, the product is the same and salesdate is the same. The problem is that I don't look for specific users or specific products.
I would like to find all users who bought the same product twice (no matter what the product is - the list is long), the same with purchasedate (the date doesn't matter, it needs to be the same for the same user).
Do you have any ideas how to do even a part of the code?
Using dplyr
, you can group_by_all
variables, and filter
out anything that do not have more than 1 records.
library(dplyr)
sales %>% group_by_all() %>% filter(n() > 1)
# A tibble: 2 × 4
# Groups: UserId, SellerId, Product, SalesDate [1]
UserId SellerId Product SalesDate
<chr> <chr> <chr> <chr>
1 A X ball 2022-01-01
2 A X ball 2022-01-01