Search code examples
rdplyr

Using a list-column as an input to the LHS of case_when


I am running into problems using a list column as an input to the LHS of dplyr::case_when().

library("dplyr")
library("tibble")
library("purrr")

# create a tibble and add a list column
tbl = tibble(a = c(1,2,3))
(b = list(c(1,7,8), c(1,7,8),c(1,2,3)))
#> [[1]]
#> [1] 1 7 8
#> 
#> [[2]]
#> [1] 1 7 8
#> 
#> [[3]]
#> [1] 1 2 3
tbl$b = b

I want a new column identifying whether each value in tbl$a is in the vector of values for the same observation in the list column tbl$b.

When I try this I get c(0,0,0), but I am expecting c(1,0,1).

tbl %>% mutate(a_in_b = case_when(a %in% b ~ 1,
                                  TRUE ~ 0))
#> # A tibble: 3 × 3
#>       a b         a_in_b
#>   <dbl> <list>     <dbl>
#> 1     1 <dbl [3]>      0
#> 2     2 <dbl [3]>      0
#> 3     3 <dbl [3]>      0

I'm not sure if this is relevant but these also give different results for reasons that are not clear to me:

tbl$a[1] %in% tbl$b[1] # evaluates as FALSE
tbl$a[1] %in% tbl$b[[1]] # evaluates as TRUE

I could use a map2()-approach, e.g.

map2(tbl$a, tbl$b, \(x,y) x %in% y) # this works

However, my real world data has multiple list columns and the map approach seems to become overly complicated.


Solution

  • Use rowwise and then it is straight forward.

    library(dplyr)
    
    tbl %>%
      rowwise %>%
      mutate(a_in_b = +(a %in% b)) %>%
      ungroup
    

    giving

    # A tibble: 3 × 3
          a b          a_in_b
      <dbl> <list>      <int>
    1     1 <dbl [3]>       1
    2     2 <dbl [3]>       0
    3     3 <dbl [3]>       1