I have a dataframe that is formatted as follows:
id date
1 1 2021-07-06
2 2 2021-07-10
3 4 2021-07-14
4 5 2021-07-18
5 8 2021-07-22
6 10 2021-07-26
7 11 2021-07-30
8 1 2021-08-03
9 2 2021-08-07
10 3 2021-08-11
11 5 2021-08-15
12 8 2021-08-19
13 9 2021-08-23
14 10 2021-08-27
15 12 2021-08-31
16 1 2021-09-04
17 3 2021-09-08
18 4 2021-09-12
19 6 2021-09-16
20 7 2021-09-20
21 8 2021-09-24
and I am looking to create a new data frame with a row for each ID that contains dummy variables indicating whether or not it appears in each month in the first data frame, resulting in the following:
id jul aug sep
1 1 1 1 1
2 2 1 1 0
3 3 0 1 1
4 4 1 0 1
5 5 1 1 0
6 6 0 0 1
7 7 0 0 1
8 8 1 1 1
9 9 0 1 0
10 10 1 1 0
11 11 1 0 0
12 12 0 1 0
How might I do this? Here is code to generate the first data frame:
> dput(df_start)
structure(list(id1 = c(1, 2, 4, 5, 8, 10, 11, 1, 2, 3, 5, 8,
9, 10, 12, 1, 3, 4, 6, 7, 8), date = structure(c(18814, 18818,
18822, 18826, 18830, 18834, 18838, 18842, 18846, 18850, 18854,
18858, 18862, 18866, 18870, 18874, 18878, 18882, 18886, 18890,
18894), class = "Date")), class = "data.frame", row.names = c(NA,
-21L))
You can extract the month and then pivot_wider
library(dplyr)
library(tidyr)
df%>%
mutate(date=format(date, "%b"),
value=1) %>%
pivot_wider(names_from=date,
values_fn = sum, # in case an id has >1 per mo
values_fill=0)
# A tibble: 12 x 4
id1 Jul Aug Sep
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 1
2 2 1 1 0
3 4 1 0 1
4 5 1 1 0
5 8 1 1 1
6 10 1 1 0
7 11 1 0 0
8 3 0 1 1
9 9 0 1 0
10 12 0 1 0
11 6 0 0 1
12 7 0 0 1