Search code examples
rdataframedplyrsummarize

Using dplyr to summarize summed proportions by groups


I've been trying to summarize data by multiple groups, where the new column should be a summary of the proportion of one column to another, by these groups. Because these two columns never both contain a value, their proportions cannot be calculated per row. Below is an example.

By, P_Common and Number7 groups, I'd like the total N_count/A_count

structure(list(P_Common = c("B", "B", "C", "C", "D", "E", "E", 
"F", "G", "G", "B", "G", "E", "D", "F", "C"), Number_7 = c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 1L, 3L, 1L, 2L, 1L, 1L), 
    N_count = c(0L, 4L, 22L, NA, 7L, 0L, 44L, 16L, NA, NA, NA, 
    NA, NA, NA, NA, NA), A_count = c(NA, NA, NA, NA, NA, NA, 
    NA, NA, 0L, 4L, 7L, NA, 23L, 4L, 7L, 17L)), class = "data.frame", row.names = c(NA, 
-16L))


P_Common    Number_7    N_count A_count
B           1           0       NA
B           1           4       NA
C           1           22      NA
C           1           NA      NA
D           2           7       NA
E           2           0       NA
E           2           44      NA
F           2           16      NA
B           1           NA      7
G           3           NA      NA
E           1           NA      23
D           2           NA      4
F           1           NA      7
C           1           NA      17

In this example there'd be quite some 0 / NA values but that's okay, they can stay in, but overall it would become like

P_Common    Number_7   Propo
B           1          0.571428571
C           1          1.294117647
D           2          1.75
... etc

Solution

  • You can do:

    df %>% 
      group_by(P_Common, Number_7) %>% 
      summarise(Propo = sum(N_count, na.rm = T) / sum(A_count, na.rm = T))
    
      P_Common Number_7   Propo
      <chr>       <int>   <dbl>
    1 B               1   0.571
    2 C               1   1.29 
    3 D               2   1.75 
    4 E               1   0    
    5 E               2 Inf    
    6 F               1   0    
    7 F               2 Inf    
    8 G               3   0