Search code examples
rfiltercomparisonrows

How to filter for criteria where rows are compared to one another in R


I have a dataframe:

UserId <- c("A", "A", "A", "B", "B", "B")
SellerId <- c("X", "X", "Y", "Y", "Z", "Z")
Product <- c("ball", "ball", "ball", "ball", "doll", "doll")
SalesDate <- c("2022-01-01", "2022-01-01", "2022-01-02", "2022-01-04", "2022-01-06", "2022-01-07")

sales <- data.frame(UserId, SellerId, Product, SalesDate)

And I want to find sales for which:

  • the same user bought the same product twice from the same seller on the same day, but of course I need to do it on a larger scale.

I've been thinking for a long time how to even use one of these criteria and nothing comes to mind. The table I should be left with in this case is:

UserId SellerId Product SalesDate
A X ball 2022-01-01
A X ball 2022-01-01

UserId is the same, seller is the same, the product is the same and salesdate is the same. The problem is that I don't look for specific users or specific products.

I would like to find all users who bought the same product twice (no matter what the product is - the list is long), the same with purchasedate (the date doesn't matter, it needs to be the same for the same user).

Do you have any ideas how to do even a part of the code?


Solution

  • Using dplyr, you can group_by_all variables, and filter out anything that do not have more than 1 records.

    library(dplyr)
    
    sales %>% group_by_all() %>% filter(n() > 1)
    
    # A tibble: 2 × 4
    # Groups:   UserId, SellerId, Product, SalesDate [1]
      UserId SellerId Product SalesDate 
      <chr>  <chr>    <chr>   <chr>     
    1 A      X        ball    2022-01-01
    2 A      X        ball    2022-01-01