I wish to add subheader and subtotal/margin rows within a table. Ultimately, I am looking for a structure shown below, which I will export to Excel with openxlsx
and writeData
.
2019 | 2020 | 2021 | |
---|---|---|---|
A | |||
A1 | 1001 | 1157 | 911 |
A2 | 1005 | 803 | 1110 |
A3 | 1125 | 897 | 1190 |
Total A | 3131 | 2857 | 3211 |
B | |||
B1 | 806 | 982 | 1098 |
B2 | 1106 | 945 | 1080 |
B3 | 1057 | 1123 | 867 |
Total B | 2969 | 3050 | 3045 |
C | |||
C1 | 847 | 1087 | 1140 |
C2 | 1146 | 966 | 1176 |
C3 | 1071 | 915 | 892 |
Total C | 3064 | 2968 | 3208 |
Total All | 9164 | 8875 | 9464 |
I suspect the subheaders and subtotals are completely different questions, but I am asking both here in case there is a common method related to each.
Create the Sample Data (long format):
d <- data.frame(
year = rep(c(2019, 2020, 2021), times = 9),
sector = rep(c("A","B","C"),each = 9),
subsector = paste0(rep(c("A","B","C"),each = 9), rep(c("1","2","3"), each = 3)),
value = sample(800:1200, 27, replace = FALSE)
)
Output:
head(d)
#> year sector subsector value
#> 1 2019 A A1 1001
#> 2 2020 A A1 1157
#> 3 2021 A A1 911
#> 4 2019 A A2 1005
#> 5 2020 A A2 803
#> 6 2021 A A2 1110
Format wide and add a margin (total) row:
library(janitor)
#[snip]warnings[/snip]
library(tidyverse)
#[snip]warnings[/snip]
d %>%
group_by(year, sector, subsector) %>%
summarise(sales = sum(value, na.rm = TRUE)) %>%
pivot_wider(names_from = year, values_from = sales) %>%
janitor::adorn_totals(where = "row")
Output:
#> `summarise()` has grouped output by 'year', 'sector'. You can override using the `.groups` argument.
#> sector subsector 2019 2020 2021
#> A A1 1001 1157 911
#> A A2 1005 803 1110
#> A A3 1125 897 1190
#> B B1 806 982 1098
#> B B2 1106 945 1080
#> B B3 1057 1123 867
#> C C1 847 1087 1140
#> C C2 1146 966 1176
#> C C3 1071 915 892
#> Total - 9164 8875 9464
Created on 2022-03-02 by the reprex package (v2.0.1)
The janitor
package's adorn_totals()
function works well for adding a margin row or column for the entire set. And Sam Firke's response here hints at a solution using tidyr::gather
but my data is in a different format. I don't want to "gather" the columns. Others in the same thread show solutions but they place all the totals at the end of the table.
I can imagine a solution where I loop through the sector factors and assemble and combine tables for each sector, but I suspect I am overthinking this and there is a simpler solution.
Is there an existing solution for this objective, or ideas on accomplishing this efficiently/universally?
Please Note: the number of subsectors per sector will vary in the actual data (i.e., some may have only one subsector, others may have several), and there is no naming convention relating the subsector to the sector (i.e., the parent sector will not be part of the child subsectors name: rather than Sector: "A", Subsector: "A1", it might be Sector: "Manufacturing", Subsector: "Cars").
Your answer got me 90% of the way there and your subsequent comments lead me to the remaining solution.
gt
has a function as_raw_html()
which, using xml2::read_html()
and rvest::html_table()
convert the gt()
object to a tibble while keeping the subheaders.
library(dplyr)
library(tidyr)
library(purrr)
library(gt)
library(xml2)
library(rvest)
d <- data.frame(
year = rep(c(2019, 2020, 2021), times = 9),
sector = rep(c("A","B","C"),each = 9),
subsector = paste0(rep(c("A","B","C"),each = 9), rep(c("1","2","3"), each = 3)),
value = sample(800:1200, 27, replace = FALSE)
)
d %>%
group_by(year, sector, subsector) %>%
summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>%
pivot_wider(names_from = year, values_from = sales) %>%
group_by(sector) %>%
group_modify(~ .x %>% adorn_totals(where = "row")) %>%
gt() %>%
gt::as_raw_html() %>%
xml2::read_html() %>%
rvest::html_table()
#> [[1]]
#> # A tibble: 15 x 4
#> subsector `2019` `2020` `2021`
#> <chr> <chr> <chr> <chr>
#> 1 A A A A
#> 2 A1 932 1117 800
#> 3 A2 925 1078 1090
#> 4 A3 816 1058 1146
#> 5 Total 2673 3253 3036
#> 6 B B B B
#> 7 B1 862 1181 947
#> 8 B2 1083 812 912
#> 9 B3 1079 1130 1097
#> 10 Total 3024 3123 2956
#> 11 C C C C
#> 12 C1 966 895 944
#> 13 C2 970 1147 1166
#> 14 C3 1043 1116 826
#> 15 Total 2979 3158 2936
Created on 2022-03-02 by the reprex package (v2.0.1)
The subheader rows repeat the sector name in all columns; other than that, it looks good.
Interestingly, rvest
also has a read_html
function that might even reference the xml2::read_html()
function, but it did not work in this context.
Instead of applying adorn_totals
on the entire summary, use group_modify
and then convert to gt
library(dplyr)
library(tidyr)
library(purrr)
library(janitor)
library(gt)
d %>%
group_by(year, sector, subsector) %>%
summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>%
pivot_wider(names_from = year, values_from = sales) %>%
group_by(sector) %>%
group_modify(~ .x %>% adorn_totals(where = "row")) %>%
gt()
-output
An option is also to split the column with expss
library(expss)
library(openxlsx)
out <- d %>%
group_by(year, sector, subsector) %>%
summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>%
pivot_wider(names_from = year, values_from = sales) %>%
group_by(sector) %>%
group_modify(~ .x %>% adorn_totals(where = "row")) %>%
ungroup %>%
split_columns(columns = 1)
wb <- createWorkbook()
sh <- addWorksheet(wb, "Tables")
xl_write(out, wb, sh)
saveWorkbook(wb, file.path(getwd(), "Documents/table1.xlsx"), overwrite = TRUE)
-output