Given:
library(data.table)
dat1 <- setDT(data.frame(pat=c("A.C",".BC"),val=c(1,2)))
dat2 <- setDT(data.frame(q=c("ABC","AXC","XBC"),val2=c(10,11,12)))
I want the result to be:
dat3 <- setDT(data.frame(pat=c("A.C","A.C",".BC",".BC"),val=c(1,1,2,2),q=c("ABC","AXC","ABC","XBC"),val2=c(10,11,10,12)))
dat3
pat val q val2
1: A.C 1 ABC 10
2: A.C 1 AXC 11
3: .BC 2 ABC 10
4: .BC 2 XBC 12
In other words, a left join of dat1
to dat2
for each match of regex pattern pat
to query string q
I was wondering if this is possible with a concise data.table
merge expression, i.e.
dat1[dat2, .(pat, val, q, val2), on= .(grepl(pat,q))] # this does not work
or any other data.table
trick for efficiency. In reality dat1
is hundreds of rows and dat2
can be 10-100 thousand rows.
This sounds simple enough, but I haven't found a post that quite covers it.
The closest I've gotten to is the following:
match_pat_to_q <- function(pattern, data, data.col="q"){
ret<-lapply(pattern, function(x){
data[grepl(x,get(data.col))]
})
names(ret) <- pattern # becomes an .id column in next step
rbindlist(ret, idcol=TRUE)
}
match_pat_to_q(dat1$pat, dat2)[dat1, on=.(.id==pat)]
.id q val2 val
1: A.C ABC 10 1
2: A.C AXC 11 1
3: .BC ABC 10 2
4: .BC XBC 12 2
You can solve your problem as follows:
dat1[, dat2[grep(pat, q), .(val, x=q, val2)], by=pat]
pat val x val2
<char> <num> <char> <num>
1: A.C 1 ABC 10
2: A.C 1 AXC 11
3: .BC 2 ABC 10
4: .BC 2 XBC 12
You group dat1
by pat
. This allows to logically create one group for each pattern in dat1
. Then for each group, you filter the corresponding data from dat2
using pattern matching while selecting the variables needed at the same time.