Search code examples
rdplyrdata-munging

Replace values in ordered columns named by year, from columns indicating range


I want to recode several year columns using a range specified by 'StartYear' and 'CloseYear'.

What's an elegant way to get from here:

library(tibble); library(dplyr)

(df <- tibble(id = c(1,2,3, 4),
              `1997` = c(1,0,0, 1), 
              `1998` = c(0,1,0, 0), 
              `1999` = c(0,0,1, 0),
              `2000` = c(0, 0, 1, 1),
              StartYear = c(1998, 1997, 1998, 1998),
              CloseYear = c(1999, 1997, 2000, 1999)))
#> # A tibble: 4 x 7
#>      id `1997` `1998` `1999` `2000` StartYear CloseYear
#>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>     <dbl>     <dbl>
#> 1     1      1      0      0      0      1998      1999
#> 2     2      0      1      0      0      1997      1997
#> 3     3      0      0      1      1      1998      2000
#> 4     4      1      0      0      1      1998      1999

To here:

(tibble(id = c(1,2,3, 4),
              `1997` = c(0, 1, 0, 0), 
              `1998` = c(1, 0, 1, 1), 
              `1999` = c(1, 0, 1, 1),
              `2000` = c(0, 0, 1, 0),
              StartYear = c(1998, 1997, 1998, 1998),
              CloseYear = c(1999, 1997, 2000, 1999)))
#> # A tibble: 4 x 7
#>      id `1997` `1998` `1999` `2000` StartYear CloseYear
#>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>     <dbl>     <dbl>
#> 1     1      0      1      1      0      1998      1999
#> 2     2      1      0      0      0      1997      1997
#> 3     3      0      1      1      1      1998      2000
#> 4     4      0      1      1      0      1998      1999

And is there a decent way using dplyr / the dplyr::mutate functions?


Solution

  • One possible tidyverse approach. Gather, mutate, then spread...

    library(tidyverse)
    df %>% 
      gather(year, value, -id, -StartYear, -CloseYear, convert = TRUE) %>%
      mutate(value = as.integer(StartYear <= year & year <= CloseYear)) %>% 
      spread(year, value)
    #> # A tibble: 4 x 7
    #>      id StartYear CloseYear `1997` `1998` `1999` `2000`
    #>   <dbl>     <dbl>     <dbl>  <int>  <int>  <int>  <int>
    #> 1     1      1998      1999      0      1      1      0
    #> 2     2      1997      1997      1      0      0      0
    #> 3     3      1998      2000      0      1      1      1
    #> 4     4      1998      1999      0      1      1      0