I have two data frames.
Frame-1
Frame-2
From
and to
of data.frame.2 are the ranges in which column Step
of the data.frame.1 are present.I want to combine these two data frames using the Steps
with their range (From
and to
), and then assign them new_id
. Both data frames have different numbers of rows.
Example
# DataFrame1
df1 <- data.frame(Step = c(1:10), id = paste0("id_", c(1:10)))
# DataFrame2
df2 <- data.frame(from = seq(1,10,2), to = seq(3,12,2), new_id = paste0("newLabel_", c(1:5)))
What I tried
df1$label <- ifelse(sapply(df1$Step, function(p)
any(df2$from > p & df2$to < p)),
df2$new_id, NA)
Error
I get NAs
instead of labels.
With dplyr
version >= 1.1.0, we can use non-equi join with join_by
library(dplyr)
left_join(df1, df2, by = join_by(closest(Step >= from), closest(Step <= to))) %>%
select(names(df1), new_id)
-output
Step id new_id
1 1 id_1 newLabel_1
2 2 id_2 newLabel_1
3 3 id_3 newLabel_2
4 4 id_4 newLabel_2
5 5 id_5 newLabel_3
6 6 id_6 newLabel_3
7 7 id_7 newLabel_4
8 8 id_8 newLabel_4
9 9 id_9 newLabel_5
10 10 id_10 newLabel_5
Or using data.table
library(data.table)
setDT(df1)[df2, new_id := i.new_id, on = .(Step >= from, Step <= to)]
-output
> df1
Step id new_id
1: 1 id_1 newLabel_1
2: 2 id_2 newLabel_1
3: 3 id_3 newLabel_2
4: 4 id_4 newLabel_2
5: 5 id_5 newLabel_3
6: 6 id_6 newLabel_3
7: 7 id_7 newLabel_4
8: 8 id_8 newLabel_4
9: 9 id_9 newLabel_5
10: 10 id_10 newLabel_5