Search code examples
rsummarize

Can I add a totals row to a summary table but with a conditional sum?


I have two columns in a data frame I want to summarize by region. I then want to add a totals row to the summary table, but the total is a count of the number rows in the summary table with a value > 0. I have to repeat this for several different data frames.

region <- c("R1", "R2", "R3", "R4", "R5", "R1", "R2", "R3", "R4", "R5")
housing <- c(0,1,1,1,1,0,1,0,0,1)
food <- c(1,1,0,1,1,0,1,0,0,1)
df <- data.frame(region,housing,food)

Using janitor's adorn_totals is close, but it sums the values for each column.

library(janitor)
df %>%
  group_by(region) %>%
  summarise_at(vars(housing:food), sum, na.rm = TRUE) %>%
  adorn_totals(where = "row", name = "Not the total I want")

sum table close

Nice, but I want to count the number of rows in each column that are greater than 0, like this:

enter image description here

I tried sum with ifelse which I couldn't get to work (like Excel countif). I can create a separate summary table and then bind_rows, but that's a lot of steps. Is there a way to use adorn_totals? Any suggestions are appreciated.


Solution

  • You could summarize the sums first and bind those results to a second summarize result.

    library(dplyr)
    
    summarized_df <- df |>
      group_by(region) |>
      summarise(across(housing:food, \(x) sum(x, na.rm = TRUE))) 
    
    bind_rows(
      summarized_df,
      summarized_df |>
        summarise(region = 'Count',
                  across(housing:food, \(x) sum(x > 0)))
    )
    #> # A tibble: 6 × 3
    #>   region housing  food
    #>   <chr>    <dbl> <dbl>
    #> 1 R1           0     1
    #> 2 R2           2     2
    #> 3 R3           1     0
    #> 4 R4           1     1
    #> 5 R5           2     2
    #> 6 Count        4     4