Search code examples
rdplyrdata-cleaning

In R, how do I create a new df column based on an existing column compared to a vector?


In R, I have a data frame and a vector. The data frame has a column of dates (e.g. column A). The vector also contains dates. The dates are not necessarily continuous (i.e. a few consecutive dates may be 1/4/23, 1/17/23, 2/4/23, etc.) for either column A or the vector.

I want to create a new column in the data frame (column B) which is equal to (for each row) the minimum value of the vector that is greater than the date in column A. Perhaps a more general way of putting it, I want to create a new data frame column based on an existing column compared to a vector.

I have figured out how to do this using a function/loop, but it is not the cleanest. Is there a simpler way to do this without a loop? A dplyr solution would be ideal, as that is what I mostly use elsewhere in my code, but any help would be much appreciated. It would also be helpful to know if this is not possible without a loop. Thanks!


Solution

  • Using a rowwise mutate in dplyr, subset the vector to elements >= your date column, sort, and take the first element:

    library(dplyr)
    
    # example data
    dat <- data.frame(
      columnA = as.Date(c("2023-01-04", "2023-01-17", "2023-02-04"))
    )
    vec <- as.Date(c("2023-01-01", "2023-03-01", "2023-01-04", "2023-01-30"))
    
    dat %>%
      rowwise() %>%
      mutate(columnB = first(sort(vec[vec >= columnA]))) %>%
      ungroup()
    
    # A tibble: 3 × 2
      columnA    columnB   
      <date>     <date>    
    1 2023-01-04 2023-01-04
    2 2023-01-17 2023-01-30
    3 2023-02-04 2023-03-01