Search code examples
rcategoriesoverlapping

R to calculate the unique count for Overlap Categories


I have been so frustrated to count the number of clients who made conversion or not in the 3 single categories (app, desktop, web) scenarios and the rest of the overlapping categories (app & Web, app & desktop, web & desktop, app & web & desktop) scenarios. Here is the sample dataset I am working on.

I could figure out the single category count with the aggregate and group_by function in r, however, I can't really figure out how to work on the overlap categories.

Really really thanks so much if someone could help me on this!!! Thanks!!!

df <- data.frame(list(ClientID = c("1", "1", "1", "2", "2", "3", "3" , "3" , "3" , "4" ),
                     device = c("App", "Web", "App", "Web", "Web", "App", "Desktop", "App", "App", "Web"),
                     conversion = c("0", "0", "0", "0", "1", "1", "0", "1", "0", "1")) ) 

Below is the desired outcome:

Scenario                With Conversion      Without Conversion 
App                          
Web                          
Desktop
App & Web
App & Desktop
Web & Desktop
App & Desktop & Web

Solution

  • You can prepare a list of device combinations (i.e. all 7 possibilities), and then use setequal() and unique() in a helper function, like this

    1. list of device combinations, using combn()
    device_combinations = unlist(
      sapply(1:3, \(i) combn(c("App", "Web" ,"Desktop"), i, simplify = F)),
      recursive = F
    )
    
    1. Helper function, f (takes a list of devices, and indicates whether the unique list of devices meets each of the device combinations (it can meet at max one only, by using unique() and setequal()
    f <- function(d) sapply(device_combinations, \(ds) setequal(ds, unique(d)))
    
    1. Now apply the function by conversion and ClientID, unnest, count by conversion and Scenario and pivot to your desired wide format
    df %>% 
      group_by(conversion,ClientID) %>%
      summarize(v = f(device), .groups="drop") %>%
      unnest(v) %>% 
      mutate(
        Scenario=rep(sapply(device_combinations,paste,collapse=","),length.out=n()),
        conversion =if_else(conversion=="0", "Without Conversion", "With Conversion")
      ) %>% 
      group_by(Scenario, conversion) %>% 
      summarize(ct = sum(v), .groups="drop") %>% 
      pivot_wider(id_cols = Scenario, names_from = conversion,values_from = ct)
    

    Output:

      Scenario        `With Conversion` `Without Conversion`
      <chr>                       <int>                <int>
    1 App                             1                    0
    2 App,Desktop                     0                    1
    3 App,Web                         0                    1
    4 App,Web,Desktop                 0                    0
    5 Desktop                         0                    0
    6 Web                             2                    1
    7 Web,Desktop                     0                    0