Search code examples
rdplyrcountsummarizegroup

Create count table for specific condition and then add column that creates count by group as a whole in R


I have a table like this:

data1 <- data.frame("State" = c("NJ", "NJ", "PA", "NJ", "TX"), "Filter" = c("Filter", "Filter", 
"No Filter", "Filter", "Filter"), "Threshold" = c("Exceeds","Exceeds", NA, "NL", "Exceeds"))

I'd like to create a count table that groups by State and Filter and then counts the number of times the Threshold is "Exceeds" and put that value in a new column. And then count the number of times a certain State and Filter combination occurs and put it in another column. An example of what I'm looking for is below.

final_data <- data.frame("State" = c("NJ", "NJ", "PA", "NJ", "TX"), "Filter" = c("Filter", 
"Filter", "No Filter", "Filter", "Filter"), "Threshold" = c("Exceeds", "Exceeds", NA, "NL", 
"Exceeds"), Count_Exceeds_Threshold = c(2, 2, 0, 0, 1), Count_Total = c(3, 3, 1, 3, 1))

I've tried figuring this out with group_by and tally() in dplyr, but I can't get it to work the way I want.

Thank you!!


Solution

  • You can use mutate and .by for inline grouping, and count the number of times Threshold == "Exceeds". n() is used to get the number of rows by group.

    library(dplyr)
    data1 %>% 
      mutate(Count_Exceeds_Threshold = sum(Threshold == "Exceeds", na.rm = TRUE),
             Count_Total = n(), .by = c(State, Filter))
    
    #   State    Filter Threshold Count_Exceeds_Threshold Count_Total
    # 1    NJ    Filter   Exceeds                       2           3
    # 2    NJ    Filter   Exceeds                       2           3
    # 3    PA No Filter      <NA>                       0           1
    # 4    NJ    Filter        NL                       2           3
    # 5    TX    Filter   Exceeds                       1           1