Search code examples
rdataframedatedplyrreshape

Creating dummy variables indicating whether IDs occur within monthly data


I have a dataframe that is formatted as follows:

    id       date
1    1 2021-07-06
2    2 2021-07-10
3    4 2021-07-14
4    5 2021-07-18
5    8 2021-07-22
6   10 2021-07-26
7   11 2021-07-30
8    1 2021-08-03
9    2 2021-08-07
10   3 2021-08-11
11   5 2021-08-15
12   8 2021-08-19
13   9 2021-08-23
14  10 2021-08-27
15  12 2021-08-31
16   1 2021-09-04
17   3 2021-09-08
18   4 2021-09-12
19   6 2021-09-16
20   7 2021-09-20
21   8 2021-09-24

and I am looking to create a new data frame with a row for each ID that contains dummy variables indicating whether or not it appears in each month in the first data frame, resulting in the following:

    id jul aug sep
1    1   1   1   1
2    2   1   1   0
3    3   0   1   1
4    4   1   0   1
5    5   1   1   0
6    6   0   0   1
7    7   0   0   1
8    8   1   1   1
9    9   0   1   0
10  10   1   1   0
11  11   1   0   0
12  12   0   1   0

How might I do this? Here is code to generate the first data frame:

> dput(df_start)
structure(list(id1 = c(1, 2, 4, 5, 8, 10, 11, 1, 2, 3, 5, 8, 
9, 10, 12, 1, 3, 4, 6, 7, 8), date = structure(c(18814, 18818, 
18822, 18826, 18830, 18834, 18838, 18842, 18846, 18850, 18854, 
18858, 18862, 18866, 18870, 18874, 18878, 18882, 18886, 18890, 
18894), class = "Date")), class = "data.frame", row.names = c(NA, 
-21L))

Solution

  • You can extract the month and then pivot_wider

    library(dplyr)
    library(tidyr)
    
    df%>%
      mutate(date=format(date, "%b"),
             value=1) %>%
      pivot_wider(names_from=date,
                  values_fn = sum, # in case an id has >1 per mo
                  values_fill=0)
    
    # A tibble: 12 x 4
         id1   Jul   Aug   Sep
       <dbl> <dbl> <dbl> <dbl>
     1     1     1     1     1
     2     2     1     1     0
     3     4     1     0     1
     4     5     1     1     0
     5     8     1     1     1
     6    10     1     1     0
     7    11     1     0     0
     8     3     0     1     1
     9     9     0     1     0
    10    12     0     1     0
    11     6     0     0     1
    12     7     0     0     1