I am running into problems using a list column as an input to the LHS of dplyr::case_when()
.
library("dplyr")
library("tibble")
library("purrr")
# create a tibble and add a list column
tbl = tibble(a = c(1,2,3))
(b = list(c(1,7,8), c(1,7,8),c(1,2,3)))
#> [[1]]
#> [1] 1 7 8
#>
#> [[2]]
#> [1] 1 7 8
#>
#> [[3]]
#> [1] 1 2 3
tbl$b = b
I want a new column identifying whether each value in tbl$a
is in the vector of values for the same observation in the list column tbl$b
.
When I try this I get c(0,0,0)
, but I am expecting c(1,0,1)
.
tbl %>% mutate(a_in_b = case_when(a %in% b ~ 1,
TRUE ~ 0))
#> # A tibble: 3 × 3
#> a b a_in_b
#> <dbl> <list> <dbl>
#> 1 1 <dbl [3]> 0
#> 2 2 <dbl [3]> 0
#> 3 3 <dbl [3]> 0
I'm not sure if this is relevant but these also give different results for reasons that are not clear to me:
tbl$a[1] %in% tbl$b[1] # evaluates as FALSE
tbl$a[1] %in% tbl$b[[1]] # evaluates as TRUE
I could use a map2()
-approach, e.g.
map2(tbl$a, tbl$b, \(x,y) x %in% y) # this works
However, my real world data has multiple list columns and the map approach seems to become overly complicated.
Use rowwise and then it is straight forward.
library(dplyr)
tbl %>%
rowwise %>%
mutate(a_in_b = +(a %in% b)) %>%
ungroup
giving
# A tibble: 3 × 3
a b a_in_b
<dbl> <list> <int>
1 1 <dbl [3]> 1
2 2 <dbl [3]> 0
3 3 <dbl [3]> 1