I have a data table:
dt <- data.table(col1=c('aa,bb', 'bb,cc,ee', 'dd,ee'), col2=c('aa', 'cc', 'aa'))
> dt
col1 col2
1: aa,bb aa
2: bb,cc,ee cc
3: dd,ee aa
I want to check if column 2 occurs in the strsplit of column one, so for the first row if aa
is present in aa,bb
split by a comma, which is true. It's also true for the second row, and false for the third. I only want to keep the rows where this occurs, so only row 1 and 2.
My first thought was doing it like this:
dt[col2 %in% strsplit(col1, ',')]
However, that returns an empty data.table.
I can think of multiple solutions to solve this, including making new columns using tstrsplit
, or melting the data table, but all of these are a bit tedious for such a seemingly simple task. Any suggestions?
We can use str_detect
from stringr
library(stringr)
dt[, flag := str_detect(col1, col2)]
dt
# col1 col2 flag
#1: aa,bb aa TRUE
#2: bb,cc,ee cc TRUE
#3: dd,ee aa FALSE
Also, to avoid any substring matches, we can specify the word boundary (\\b
)
dt[, str_detect(col1, str_c("\\b", col2, "\\b"))]
#[1] TRUE TRUE FALSE
Regarding the use of strsplit
, the output would be a list
of vector
s. So, we need to use a function that checks the values of 'col1' are in the corresponding elements of list
. Map
does that
dt[, unlist(Map(`%in%`, col2, strsplit(col1, ",")))]
To apply the filter in the same step and return the 2 row data.table:
dt[unlist(Map(`%in%`, col2, strsplit(col1, ",")))]