code to create df: I have the repeat data that looks like the table below.
df <- structure(list(patid = c("1", "1", "1", "1", "2", "2", "3", "3",
"3", "4", "4", "4", "4"), observation_date = c("07/07/2016",
"07/08/2016", "07/11/2016", "07/07/2019", "07/05/2015", "02/12/2016",
"07/05/2015", "07/06/2015", "16/06/2015", "07/05/2015", "02/12/2016",
"18/12/2016", "15/01/2017"),
registration = c("0","0","1","1","0","1","0","0","0","0","1","1","1")), class = "data.frame", row.names = c(NA,
-13L))
patid | observation_date | registration |
---|---|---|
1 | 07/07/2016 | 0 |
1 | 07/08/2016 | 0 |
1 | 07/11/2016 | 1 |
1 | 07/07/2019 | 1 |
2 | 07/05/2015 | 0 |
2 | 02/12/2016 | 1 |
3 | 07/05/2015 | 0 |
3 | 07/06/2015 | 0 |
3 | 16/06/2015 | 0 |
4 | 07/05/2015 | 0 |
4 | 02/12/2016 | 1 |
4 | 18/12/2016 | 1 |
4 | 15/01/2017 | 1 |
I would like to summarize data in this way: I want two rows only for each patid, one row will have a count of the observation_date when value of registration=0 and one row will have a count of the observation_date when value of registration=1. This is the code I tried but it is only giving a count of all observation_date per patid
cons_sum <- df%>%
group_by(patid) %>%
arrange(observation_date)%>%
tally()
The final table should look like this:
patid | registration | count |
---|---|---|
1 | 0 | 2 |
1 | 1 | 2 |
2 | 0 | 1 |
2 | 1 | 1 |
3 | 0 | 4 |
3 | 1 | 0 |
4 | 0 | 1 |
4 | 1 | 3 |
Use count
. To make every possible value appear in the final table, you should convert your registration column to a factor:
df %>%
count(patid, registration = factor(registration), .drop = FALSE)
output
patid registration n
1 1 0 2
2 1 1 2
3 2 0 1
4 2 1 1
5 3 0 3
6 3 1 0
7 4 0 1
8 4 1 3