For presentation purposes I often have to format a data frame with column and row totals and percentages.
Piping the row totals and percentages conditionally is straight forward: stackoverflow e.g.
The column totals can be neatly piped:
option 1: stackoverflow e.g.
option 2: using the janitor package function adorn_totals
(however I'd prefer to find a way without adding more packages to my workflow).
I get stuck on the next step which is to add a column % row below the column total. This row calculates the column sum (column total) as a percentage of table sum (table total).
Here I have to split my workflow to do the following:
This process feels heavy handed and I am sure there is a better way; suggestions welcome.
This is what I am aiming to achieve
Once the table is generated formatting and tidying up for presentation purposes I usually do with flextable or kableExtra as a second pass.
MWE
library(tidyverse)
tib <- tibble(v1 = c("a", "b", "c"),
v2 = 1:3,
v3 = 4:6)
# piping row summaries and column totals
tib <-
tib %>%
mutate(r_sum = rowSums(.[2:3]),
r_pc = r_sum * 100/sum(r_sum)) %>%
bind_rows(summarise_all(., funs(if(is.numeric(.)) sum(.) else "Total")))
# extract gross total
table_total <- tib$r_sum[4]
# function to calculate percentage * 2 as tib includes a column total row
calc_pc <- function(x) {sum(x) * 100 / (table_total * 2)}
# calculate column percentages
col_pc <-
tib %>%
summarise_at(vars(v1:r_sum), funs(if(is.numeric(.)) calc_pc(.) else "Column %"))
# finally bringing it all together for the desired result
tib <-
tib %>%
bind_rows(col_pc)
Using janitor
, we can do everything once we have a precalculated total.
library(janitor, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
tib <- tibble(v1 = c("a", "b", "c"), v2 = 1:3, v3 = 4:6)
total <- tib %>% select(where(is.numeric)) %>% sum
tib %>%
adorn_totals(c("row", "col")) %>%
rowwise() %>%
mutate("Row %" = round(sum(across(where(is.numeric)))/total*50)) %>%
ungroup %>%
bind_rows(summarise(., across(where(is.numeric), ~round(sum(.)/total*50)))) %>%
`[[<-`(nrow(.), 1, value = "Column %") %>%
`[[<-`(nrow(.), ncol(.), value = NA)
#> # A tibble: 5 x 5
#> v1 v2 v3 Total `Row %`
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a 1 4 5 24
#> 2 b 2 5 7 33
#> 3 c 3 6 9 43
#> 4 Total 6 15 21 100
#> 5 Column % 29 71 100 NA
Created on 2020-05-30 by the reprex package (v0.3.0)
Or slightly longer without janitor
:
library(dplyr, warn.conflicts = FALSE)
tib <- tibble(v1 = c("a", "b", "c"), v2 = 1:3, v3 = 4:6)
total <- tib %>% select(where(is.numeric)) %>% sum
tib %>%
rowwise() %>%
mutate(
Total = sum(across(where(is.numeric))),
"Row %" = round(sum(across(where(is.numeric)))/total*50)
) %>%
ungroup %>%
bind_rows(summarise(., across(where(is.numeric), sum))) %>%
`[[<-`(nrow(.), 1, value = "Total") %>%
bind_rows(summarise(., across(where(is.numeric), ~round(sum(.)/total*50)))) %>%
`[[<-`(nrow(.), 1, value = "Column %") %>%
`[[<-`(nrow(.), ncol(.), value= NA)
#> # A tibble: 5 x 5
#> v1 v2 v3 Total `Row %`
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a 1 4 5 24
#> 2 b 2 5 7 33
#> 3 c 3 6 9 43
#> 4 Total 6 15 21 100
#> 5 Column % 29 71 100 NA
Created on 2020-05-30 by the reprex package (v0.3.0)
Both can me made a bit shorter if you don't care about the row names, of course.