Search code examples
rdatefor-loopnadate-difference

Is there a more efficient way to calculate the difference in months in R


I have a large data frame in a panel structure (201720 rows; 3 columns) which looks as follows:

Name <- c("A", "A", "A", "B", "B", "B")

Inception <- c(as.Date("2007-12-31"), as.Date("2007-12-31"), as.Date("2007-12-31"),
               as.Date("1990-12-31"), as.Date("1990-12-31"), as.Date("1990-12-31"))
 
Months <- c(as.Date("2010-01-01"), as.Date("2010-02-01"), as.Date("2010-03-01"),
            as.Date("2010-01-01"), as.Date("2010-02-01"), as.Date("2010-03-01"))

df <- data.frame(Name, Inception, Months)

I want to calculate the difference in months of «Inception» and «Months» for each row and assign it to a new column named «Age». If the result is negative, it should fill in with NA. I came up with the following solution and it worked. However, the computation of it is not very fast.

for (i in 1:nrow(df)){
  if(df[i,2]>df[i,3]){
    df[i,"Age"] <- NA
  } else {
    df[i,"Age"] <- interval(df[i,2],
                            df[i,3]) %/% months(1)
  }
}

Is there a more efficient way to calculate this difference?


Solution

  • We can use case_when

    library(dplyr)
    library(lubridate)
    df <- df %>% 
      mutate(Age = case_when(Inception <= Months
         ~ interval(Inception, Months) %/% months(1)))
    

    -output

    df
    Name  Inception     Months Age
    1    A 2007-12-31 2010-01-01  24
    2    A 2007-12-31 2010-02-01  25
    3    A 2007-12-31 2010-03-01  26
    4    B 1990-12-31 2010-01-01 228
    5    B 1990-12-31 2010-02-01 229
    6    B 1990-12-31 2010-03-01 230