Search code examples
rdplyrgroup-by

Calculating a percentage in a group_by operation in R


Trying to calculate the %ge of records where a certain ratio is > 1, grouped by another column in R.

I start with the following dataframe:

Scenario (chr), Ratio (fl), OtherCols (chr)

I add a 'pass' column to, which is 1 for a pass, 0 for a fail.

df$pass = with(df, ifelse(Ratio>1,1,0))

Then I'd like to find out the percentage of passes by a given Scenario group, by summing the 'pass' col and dividing by the total number of rows in that group

df_pct <- df %>%
group_by(Scenario) %>%
summarise(pass_pct=sum(pass)/nrow(pass))
df_pct

I'm getting an empty tibble back when printing this though. Any advice or better way of doing this?

Thanks!


Solution

  • Use n() to get the desired total number of rows in that group, e.g.

    library(dplyr)
    
    df %>% 
      mutate(pass = ifelse(Ratio>1,1,0)) %>% 
      summarize(pass_pct = sum(pass)/n(), .by = Scenario)
    

    See ?cur_group for other group info functions