Search code examples
rdplyrtidyverserepeatlongitudinal

how to create a count variable that counts based on the value of a binary variable in R using dplyr


code to create df: I have the repeat data that looks like the table below.

df <- structure(list(patid = c("1", "1", "1", "1", "2", "2", "3", "3", 
                         "3", "4", "4", "4", "4"), observation_date = c("07/07/2016", 
                                                                        "07/08/2016", "07/11/2016", "07/07/2019", "07/05/2015", "02/12/2016", 
                                                                        "07/05/2015", "07/06/2015", "16/06/2015", "07/05/2015", "02/12/2016", 
                                                                        "18/12/2016", "15/01/2017"),
                      registration = c("0","0","1","1","0","1","0","0","0","0","1","1","1")), class = "data.frame", row.names = c(NA, 
                                                                                                                                          -13L))
patid observation_date registration
1 07/07/2016 0
1 07/08/2016 0
1 07/11/2016 1
1 07/07/2019 1
2 07/05/2015 0
2 02/12/2016 1
3 07/05/2015 0
3 07/06/2015 0
3 16/06/2015 0
4 07/05/2015 0
4 02/12/2016 1
4 18/12/2016 1
4 15/01/2017 1

I would like to summarize data in this way: I want two rows only for each patid, one row will have a count of the observation_date when value of registration=0 and one row will have a count of the observation_date when value of registration=1. This is the code I tried but it is only giving a count of all observation_date per patid

cons_sum <- df%>%
         group_by(patid) %>%
         arrange(observation_date)%>%
         tally()

The final table should look like this:

patid registration count
1 0 2
1 1 2
2 0 1
2 1 1
3 0 4
3 1 0
4 0 1
4 1 3

Solution

  • Use count. To make every possible value appear in the final table, you should convert your registration column to a factor:

    df %>% 
      count(patid, registration = factor(registration), .drop = FALSE)
    

    output

      patid registration n
    1     1            0 2
    2     1            1 2
    3     2            0 1
    4     2            1 1
    5     3            0 3
    6     3            1 0
    7     4            0 1
    8     4            1 3