Search code examples
rdataframemergedata.tablenon-equi-join

How would one merge 2 data frames based on a category and numeric in df1 - matching the category and finding a value falling within the range of df2?


I want to merge df1 and df2, by pulling in only the numerics from df2 into df1? This would be a nested (Lookup(xlookup) in excel, but Im having difficulty working this in r? Any info would be much appreciated.

df1 <- data.frame(Names = c("A","B","C","D","E"),
                  Rank = c("R1","R3","R4","R2","R5"),
                  Time_in_rank = c(2,4.25,3,1.5,5))

df2 <- data.frame(Time_in_rank =c(0,1,2,3,4),
                  R1 =c(20000,25000,30000,35000,40000),
                  R2 =c(45000,50000,55000,60000,65000),
                  R3 =c(70000,75000,80000,85000,90000),
                  R4 =c(95000,96000,97000,98000,100000),
                  R5 =c(105000,107000,109000,111000,112000))

Desired output
Names Time_in_rank  Rank   Salary 
A     2             R1     30000
B     4.25          R3     90000
C     3             R4     98000

Close but no cigar - Merge two data frames considering a range match between key columns

Close but still no cigar - Complex non-equi merge in R


Solution

  • You could use a tidyverse approach:

    library(dplyr)
    library(tidyr)
    
    df1 %>% 
      mutate(time_floor = floor(Time_in_rank)) %>% 
      left_join(df2 %>% 
                  pivot_longer(-Time_in_rank, names_to = "Rank", values_to = "Salary"),
                by = c("Rank", "time_floor" = "Time_in_rank")) %>% 
      select(-time_floor)
    

    This returns

      Names Rank Time_in_rank Salary
    1     A   R1         2.00  30000
    2     B   R3         4.25  90000
    3     C   R4         3.00  98000
    4     D   R2         1.50  50000
    5     E   R5         5.00     NA
    

    The main ideas are:

    • df1's Time_in_rank rounded down corresponds to Time_in_rank used in df2.
    • df2 is converted into a long format to join by time and rank.