Search code examples
rdataframeapplysapply

R: check if the column value falls between a range


I have two data frames.

Frame-1

Dataframe One

Frame-2

Dataframe 2

  • Columns From and to of data.frame.2 are the ranges in which column Step of the data.frame.1 are present.

I want to combine these two data frames using the Steps with their range (From and to), and then assign them new_id. Both data frames have different numbers of rows.

Example

# DataFrame1
df1 <- data.frame(Step = c(1:10), id = paste0("id_", c(1:10)))

# DataFrame2
df2 <- data.frame(from = seq(1,10,2), to = seq(3,12,2), new_id = paste0("newLabel_", c(1:5)))

What I tried

df1$label <- ifelse(sapply(df1$Step, function(p) 
                 any(df2$from > p & df2$to < p)), 
                 df2$new_id, NA)

Error

I get NAs instead of labels.

enter image description here


Solution

  • With dplyr version >= 1.1.0, we can use non-equi join with join_by

    library(dplyr)
    left_join(df1, df2, by = join_by(closest(Step >= from), closest(Step <= to))) %>%
       select(names(df1), new_id)
    

    -output

     Step    id     new_id
    1     1  id_1 newLabel_1
    2     2  id_2 newLabel_1
    3     3  id_3 newLabel_2
    4     4  id_4 newLabel_2
    5     5  id_5 newLabel_3
    6     6  id_6 newLabel_3
    7     7  id_7 newLabel_4
    8     8  id_8 newLabel_4
    9     9  id_9 newLabel_5
    10   10 id_10 newLabel_5
    

    Or using data.table

    library(data.table)
    setDT(df1)[df2, new_id := i.new_id, on = .(Step >= from, Step <= to)]
    

    -output

    > df1
        Step    id     new_id
     1:    1  id_1 newLabel_1
     2:    2  id_2 newLabel_1
     3:    3  id_3 newLabel_2
     4:    4  id_4 newLabel_2
     5:    5  id_5 newLabel_3
     6:    6  id_6 newLabel_3
     7:    7  id_7 newLabel_4
     8:    8  id_8 newLabel_4
     9:    9  id_9 newLabel_5
    10:   10 id_10 newLabel_5