Search code examples
rextractdata-manipulation

Create new data frame based on the match of values between a list and a column of data frame


I am trying to create a new data frame using a list of values that match the values of a column in the old data frame. Also for the new data frame I want to preserve the order from the list of values used for match. Here is an example of what I want to achieve:

#A list of values used for matching
time.new <- c(2, 3, 4, 3, 4, 5, 4, 5, 6)
#The old data frame which I would match on the column of **time.old**
old <- data.frame(time.old=1:10, y=rnorm(10))
   time.old        y
          1  0.20320
          2 -0.74696
          3 -0.73716
          4 -0.61959
          5  1.12733
          6  2.58322
          7 -0.08138
          8 -0.10436
          9 -0.13081
         10 -1.20050
#Here is the expected new data frame
       time        y
          2 -0.74696
          3 -0.73716
          4 -0.61959
          3 -0.73716
          4 -0.61959
          5  1.12733
          4 -0.61959
          5  1.12733
          6  2.58322

Solution

  • Try the left_join from dplyr. First, convert time.new to a column of a data frame:

    library(tidyverse)
    time.new <- c(2, 3, 4, 3, 4, 5, 4, 5, 6)
    #The old data frame which I would match on the column of **time.old**
    old <- data.frame(time.old=1:10, y=rnorm(10))
    
    time.new <- data.frame(time=time.new) 
    new_dataframe <- left_join(time.new, old, by=c("time"="time.old"))
    

    In base R use merge:

    merge(x = time.new, y = old, by.x = "time", by.y="time.old", all.x = TRUE)
    

    If you want to preserve the order of time.new you need to add an auxiliary row number column to your data, merge, order on row number and delete the id column:

    time.new <- c(2, 3, 4, 3, 4, 5, 4, 5, 6)
    old <- data.frame(time.old=1:10, y=rnorm(10))
        
    time.new <- data.frame(id = 1:length(time.new), time=time.new)
    
    new_dataframe <- merge(x = time.new, y = old, by.x = "time", by.y="time.old", all.x = TRUE)
    new_dataframe <- new_dataframe[order(new_dataframe$id), ]
    new_dataframe$id <- NULL