Search code examples
rdatelubridate

Create single year variables corresponding to a range of dates in R


I would like to be able to take two variables representing a starting date and an ending date and create variables indicating which years are covered over the range of those two dates.

What I have:

df1 <- data.frame(ID = c("A", "B", "C"),
                 Start_Date = c("3/5/2004", "8/22/2005", "4/8/2008"), 
                 End_Date = c("6/25/2009","11/2/2006", "6/9/2011"))

What I want:

df2 <- data.frame(ID = c("A", "B", "C"),
                 Start_Date = c("3/5/2004", "8/22/2005", "4/8/2008"), 
                 End_Date = c("6/25/2009","11/2/2006", "6/9/2011"),
                 y2004 = c(1, 0, 0), 
                 y2005 = c(1, 1, 0), 
                 y2006 = c(1, 1, 0), 
                 y2007 = c(1, 0, 0), 
                 y2008 = c(1, 0, 1), 
                 y2009 = c(0, 0, 1), 
                 y2010 = c(0, 0, 1), 
                 y2011 = c(0, 0, 1))

As above, each new year variable indicates whether or not the year is captured in the range of the two date variables "Start_Date" and "End_Date".

Any ideas would be greatly appreciated. Thanks in advance.


Solution

  • One method is to pivot to 'long' format, extract the year part after converting to Date class, then get the seq (:) from the first to last grouped by 'ID' and reshape back to 'wide', then join with the original data by 'ID'

    library(dplyr)
    library(tidyr)
    library(stringr)
    library(lubridate)
    df1 %>%
         pivot_longer(cols = -ID) %>%
         group_by(ID) %>% 
         summarise(year = str_c('y', year(mdy(value)[1]):year(mdy(value)[2])),
                n = 1, .groups = 'drop') %>% 
         pivot_wider(names_from = year, values_from = n, values_fill = 0) %>% 
         left_join(df1, .)
    

    -output

    #   ID Start_Date  End_Date y2004 y2005 y2006 y2007 y2008 y2009 y2010 y2011
    #1  A   3/5/2004 6/25/2009     1     1     1     1     1     1     0     0
    #2  B  8/22/2005 11/2/2006     0     1     1     0     0     0     0     0
    #3  C   4/8/2008  6/9/2011     0     0     0     0     1     1     1     1