Search code examples
roptimizationextractdata-manipulationcoding-efficiency

Optimize the creation of a matrix/data frame


I am trying to create a new matrix/data frame using a list of values that match the values of a column in the old data frame. Also for the new matrix/data frame I want to preserve the order from the list of values used for match. Here is an example of what I want to achieve:

#A list of values used for matching
time.new <- c(2, 3, 4, 3, 4, 5, 4, 5, 6)
#The old data frame which I would match on the column of time.old
old <- data.frame(time.old=1:10, y=rnorm(10))

I got a solution (see below) by using merge and order but I want to avoid merge and order because they really slow things down and I have a much larger dataset. Matrix outcome is preferred as data frame can also be slow (for further manipulations) So any ideas will be appreciated!

time.new <- data.frame(id = 1:length(time.new), time=time.new)
new_dataframe <- merge(x = time.new, y = old, by.x = "time", by.y="time.old", all.x = TRUE)
new_dataframe <- new_dataframe[order(new_dataframe$id), ]
new_dataframe$id <- NULL

Solution

  • We can use match to join time.new and time.old and get corresponding y value.

    set.seed(123)
    time.new <- c(2, 3, 4, 3, 4, 5, 4, 5, 6)
    old <- data.frame(time.old=1:10, y=rnorm(10))
    cbind(time = time.new, y = old$y[match(time.new, old$time.old)])
    
    #      time       y
    # [1,]    2 -0.2302
    # [2,]    3  1.5587
    # [3,]    4  0.0705
    # [4,]    3  1.5587
    # [5,]    4  0.0705
    # [6,]    5  0.1293
    # [7,]    4  0.0705
    # [8,]    5  0.1293
    # [9,]    6  1.7151