Search code examples
rdataframedummy-variable

Is there a way to create dummy variables for years that fall between two time points?


I am working with some time series data, where each row is an observation of a person, and I have two time periods, the start date and the end date. I am trying to create dummy variables for each year, such that if the year falls between the start date and the end date, the dummy is coded as 1.

The end result is to use this for data visualization purposes on demographics by year.

I've looked at some packages, but it seems to create dummies from variables already provided. Since some of the years may be missing from one of the columns, I'm trying to find an alternate option.

id <- c(1:3)
start.date <- c(1990, 1850, 1910)
end.date <- c(2014, 1920, 1980)

df <- data.frame(id, start.date, end.date)

df

As you can see from the structure of the data, I would like individual 1, for instance, to have the dummies coded between 1990 and 2014 as 1, and 0 otherwise.


Solution

  • If I understand correctly, you want a dataframe with all years for every id -

    library(dplyr)
    library(tidyr)
    
    df %>% 
      group_by(id) %>% 
      transmute(years = list(paste0("Y", start.date:end.date)), value = 1) %>% 
      unnest() %>% 
      ungroup() %>% 
      spread(years, value, fill = 0)
    
    # showing first 10 of total 157 columns
    # A tibble: 3 x 10
         id Y1850 Y1851 Y1852 Y1853 Y1854 Y1855 Y1856 Y1857 Y1858
      <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    1     1     0     0     0     0     0     0     0     0     0
    2     2     1     1     1     1     1     1     1     1     1
    3     3     0     0     0     0     0     0     0     0     0