Search code examples
rdatedplyrdata.tablerolling-computation

R Compare Lists and Identify Individual Changes


Here's an ice cream shop data frame in R, that shows the flavors at a store month to month.

df <- data.frame(date = as.Date(c(rep("2022-01-01", 3), 
                                  rep("2022-02-01", 3),
                                  rep("2022-03-01", 4))),
                 flavor = c("Almond", "Apple", "Apricot", 
                            "Almond", "Maple", "Mint",
                            "Almond", "Maple", "Mint", "Pumpkin"))

#>         date  flavor
#> 1  2022-01-01  Almond
#> 2  2022-01-01   Apple
#> 3  2022-01-01 Apricot
#> 4  2022-02-01  Almond
#> 5  2022-02-01   Maple
#> 6  2022-02-01    Mint
#> 7  2022-03-01  Almond
#> 8  2022-03-01   Maple
#> 9  2022-03-01    Mint
#> 10 2022-03-01 Pumpkin

I created a script to show the ice cream flavors added for any particular month. You can also see flavors removed, as a list. Note that there were not any flavors removed in March (2022-03-01, flavors.removed equals <chr [0]>).

library(dplyr)
library(tidyr)
df %>% 
  group_by(date) %>% 
  summarize(flavors = list(flavor)) %>% 
  mutate(flavors.added = mapply(setdiff, flavors, lag(flavors)),
         flavors.removed = mapply(setdiff, lag(flavors), flavors)) %>% 
  ungroup %>% 
  select(-flavors) %>% 
  unnest_longer(flavors.added)

#> # A tibble: 6 x 3
#>   date       flavors.added flavors.removed
#>   <date>     <chr>         <list>         
#> 1 2022-01-01 Almond        <NULL>         
#> 2 2022-01-01 Apple         <NULL>         
#> 3 2022-01-01 Apricot       <NULL>         
#> 4 2022-02-01 Maple         <chr [2]>      
#> 5 2022-02-01 Mint          <chr [2]>      
#> 6 2022-03-01 Pumpkin       <chr [0]>  

When I attempt to capture information on the flavors removed via a call to unnest_longer(flavors.removed) I end up inadvertently filtering out all information for 2022-03-01 because the flavors.removed list is empty (<chr [0]>) for this 2022-03-01 time period.

library(dplyr)
library(tidyr)
df %>% 
  group_by(date) %>% 
  summarize(flavors = list(flavor)) %>% 
  mutate(flavors.added = mapply(setdiff, flavors, lag(flavors)),
         flavors.removed = mapply(setdiff, lag(flavors), flavors)) %>% 
  ungroup %>% 
  select(-flavors) %>% 
  unnest_longer(flavors.added) %>% 
  unnest_longer(flavors.removed) %>% 
  pivot_longer(-date, names_to = "type", values_to = "flavor") %>% 
  arrange(date, type) %>% 
  unique()

#> # A tibble: 8 x 3
#>   date       type            flavor 
#>   <date>     <chr>           <chr>  
#> 1 2022-01-01 flavors.added   Almond 
#> 2 2022-01-01 flavors.added   Apple  
#> 3 2022-01-01 flavors.added   Apricot
#> 4 2022-01-01 flavors.removed NA     
#> 5 2022-02-01 flavors.added   Maple  
#> 6 2022-02-01 flavors.added   Mint   
#> 7 2022-02-01 flavors.removed Apple  
#> 8 2022-02-01 flavors.removed Apricot

Is there a better approach to individually identify flavors added and removed, month to month? I need to regain the ninth row as shown below, that gets filtered out utilizing my flawed approach.

#> # A tibble: 9 x 3
#>   date       type            flavor 
#>   <date>     <chr>           <chr>  
#> 1 2022-01-01 flavors.added   Almond 
#> 2 2022-01-01 flavors.added   Apple  
#> 3 2022-01-01 flavors.added   Apricot
#> 4 2022-01-01 flavors.removed NA     
#> 5 2022-02-01 flavors.added   Maple  
#> 6 2022-02-01 flavors.added   Mint   
#> 7 2022-02-01 flavors.removed Apple  
#> 8 2022-02-01 flavors.removed Apricot
#> 9 2022-03-01 flavors.added   Pumpkin

Solution

  • A possible solution if you don't need the NA on row 4:

    df %>% 
      group_by(date) %>% 
      summarize(flavors = list(flavor)) %>% 
      mutate(flavors.added = mapply(setdiff, flavors, lag(flavors)),
             flavors.removed = mapply(setdiff, lag(flavors), flavors)) %>% 
      ungroup %>% 
      select(-flavors) %>% 
      pivot_longer(-date, names_to = "type", values_to = "flavor") %>% 
      unnest(flavor)
    
    # A tibble: 8 × 3
      date       type            flavor 
      <date>     <chr>           <chr>  
    1 2022-01-01 flavors.added   Almond 
    2 2022-01-01 flavors.added   Apple  
    3 2022-01-01 flavors.added   Apricot
    4 2022-02-01 flavors.added   Maple  
    5 2022-02-01 flavors.added   Mint   
    6 2022-02-01 flavors.removed Apple  
    7 2022-02-01 flavors.removed Apricot
    8 2022-03-01 flavors.added   Pumpkin