Search code examples
rdplyrpanel

R dplyr: tally to create Panel Data


I have the following dataframe :

D <- data.frame("Id" = c("a","b","c","d","e","f","g"), "Group" = c("1","1","1","2","2","2","2"),"Time" = c("1","1","2","1","2","3","3"))

  Id Group Time
1  a     1    1
2  b     1    1
3  c     1    2
4  d     2    1
5  e     2    2
6  f     2    3
7  g     2    3

I want to count the number of individuals by period and time keeping a cylinder structure. The classical way to do it uses dplyr

D %>% group_by(Group,Time) %>% tally()
  Group Time      n
  <fct> <fct> <int>
1 1     1         2
2 1     2         1
3 2     1         1
4 2     2         1
5 2     3         2

but structure is not balanced : time 3 doesn't show up for group 1 here but I'd like to see it associated with a 0, like this:

  Group Time      n
  <fct> <fct> <int>
1 1     1         2
2 1     2         1
3 1     3         0
4 2     1         1
5 2     2         1
6 2     3         2

Is-there a way to balance the results after a group_by ? Has someone faced something similar ? Thanks in advance


Solution

  • Since Time is factor variable we can use count with .drop = FALSE as by default count drops observation with 0 counts.

    library(dplyr)
    D %>% count(Group, Time, .drop = FALSE)
    
    #  Group Time      n
    #  <fct> <fct> <int>
    #1 1     1         2
    #2 1     2         1
    #3 1     3         0
    #4 2     1         1
    #5 2     2         1
    #6 2     3         2
    

    We can use the same method using tally as well.

    D %>% group_by(Group,Time, .drop = FALSE) %>% tally()
    

    Or with complete

    D %>%  count(Group, Time) %>% tidyr::complete(Group, Time, fill = list(n = 0))