Search code examples
rdplyrtidyrrvestgt

Add Group Subheader and Subtotal Rows to data.frame or table in R


Objective

I wish to add subheader and subtotal/margin rows within a table. Ultimately, I am looking for a structure shown below, which I will export to Excel with openxlsx and writeData.

2019 2020 2021
A
A1 1001 1157 911
A2 1005 803 1110
A3 1125 897 1190
Total A 3131 2857 3211
B
B1 806 982 1098
B2 1106 945 1080
B3 1057 1123 867
Total B 2969 3050 3045
C
C1 847 1087 1140
C2 1146 966 1176
C3 1071 915 892
Total C 3064 2968 3208
Total All 9164 8875 9464

I suspect the subheaders and subtotals are completely different questions, but I am asking both here in case there is a common method related to each.

Reproducible Code So Far

Create the Sample Data (long format):

d <- data.frame(
  year = rep(c(2019, 2020, 2021), times = 9),
  sector = rep(c("A","B","C"),each = 9),
  subsector = paste0(rep(c("A","B","C"),each = 9), rep(c("1","2","3"), each = 3)),
  value = sample(800:1200, 27, replace = FALSE)
)

Output:

head(d)
#>   year sector subsector value
#> 1 2019      A        A1  1001
#> 2 2020      A        A1  1157
#> 3 2021      A        A1   911
#> 4 2019      A        A2  1005
#> 5 2020      A        A2   803
#> 6 2021      A        A2  1110

Format wide and add a margin (total) row:

library(janitor)
#[snip]warnings[/snip]
library(tidyverse)
#[snip]warnings[/snip]

d %>%
    group_by(year, sector, subsector) %>%
    summarise(sales = sum(value, na.rm = TRUE)) %>% 
    pivot_wider(names_from = year, values_from = sales) %>%
    janitor::adorn_totals(where = "row")

Output:

#> `summarise()` has grouped output by 'year', 'sector'. You can override using the `.groups` argument.
#>  sector subsector 2019 2020 2021
#>       A        A1 1001 1157  911
#>       A        A2 1005  803 1110
#>       A        A3 1125  897 1190
#>       B        B1  806  982 1098
#>       B        B2 1106  945 1080
#>       B        B3 1057 1123  867
#>       C        C1  847 1087 1140
#>       C        C2 1146  966 1176
#>       C        C3 1071  915  892
#>   Total         - 9164 8875 9464

Created on 2022-03-02 by the reprex package (v2.0.1)

The janitor package's adorn_totals() function works well for adding a margin row or column for the entire set. And Sam Firke's response here hints at a solution using tidyr::gather but my data is in a different format. I don't want to "gather" the columns. Others in the same thread show solutions but they place all the totals at the end of the table.

I can imagine a solution where I loop through the sector factors and assemble and combine tables for each sector, but I suspect I am overthinking this and there is a simpler solution.

Is there an existing solution for this objective, or ideas on accomplishing this efficiently/universally?

Please Note: the number of subsectors per sector will vary in the actual data (i.e., some may have only one subsector, others may have several), and there is no naming convention relating the subsector to the sector (i.e., the parent sector will not be part of the child subsectors name: rather than Sector: "A", Subsector: "A1", it might be Sector: "Manufacturing", Subsector: "Cars").

@akrun -- Solution!

Your answer got me 90% of the way there and your subsequent comments lead me to the remaining solution.

gt has a function as_raw_html() which, using xml2::read_html() and rvest::html_table() convert the gt() object to a tibble while keeping the subheaders.

library(dplyr)
library(tidyr)
library(purrr)
library(gt)
library(xml2)
library(rvest)

d <- data.frame(
  year = rep(c(2019, 2020, 2021), times = 9),
  sector = rep(c("A","B","C"),each = 9),
  subsector = paste0(rep(c("A","B","C"),each = 9), rep(c("1","2","3"), each = 3)),
  value = sample(800:1200, 27, replace = FALSE)
)

d %>%
  group_by(year, sector, subsector) %>%
  summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>% 
  pivot_wider(names_from = year, values_from = sales) %>%
  group_by(sector) %>%
  group_modify(~ .x %>% adorn_totals(where = "row")) %>%
  gt() %>% 
  gt::as_raw_html() %>% 
  xml2::read_html() %>% 
  rvest::html_table()
#> [[1]]
#> # A tibble: 15 x 4
#>    subsector `2019` `2020` `2021`
#>    <chr>     <chr>  <chr>  <chr> 
#>  1 A         A      A      A     
#>  2 A1        932    1117   800   
#>  3 A2        925    1078   1090  
#>  4 A3        816    1058   1146  
#>  5 Total     2673   3253   3036  
#>  6 B         B      B      B     
#>  7 B1        862    1181   947   
#>  8 B2        1083   812    912   
#>  9 B3        1079   1130   1097  
#> 10 Total     3024   3123   2956  
#> 11 C         C      C      C     
#> 12 C1        966    895    944   
#> 13 C2        970    1147   1166  
#> 14 C3        1043   1116   826   
#> 15 Total     2979   3158   2936

Created on 2022-03-02 by the reprex package (v2.0.1)

The subheader rows repeat the sector name in all columns; other than that, it looks good.

Interestingly, rvest also has a read_html function that might even reference the xml2::read_html() function, but it did not work in this context.


Solution

  • Instead of applying adorn_totals on the entire summary, use group_modify and then convert to gt

    library(dplyr)
    library(tidyr)
    library(purrr)
    library(janitor)
    library(gt)
    d %>%
      group_by(year, sector, subsector) %>%
      summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>% 
      pivot_wider(names_from = year, values_from = sales) %>%
      group_by(sector) %>%
      group_modify(~ .x %>% adorn_totals(where = "row")) %>%
      
      gt()
    

    -output

    enter image description here


    An option is also to split the column with expss

    library(expss)
    library(openxlsx)
    out <- d %>%
      group_by(year, sector, subsector) %>%
      summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>% 
      pivot_wider(names_from = year, values_from = sales) %>%
      group_by(sector) %>%
      group_modify(~ .x %>% adorn_totals(where = "row")) %>%
      ungroup %>%
      split_columns(columns = 1) 
    wb <- createWorkbook()
    sh <- addWorksheet(wb, "Tables")
    xl_write(out, wb, sh)
    saveWorkbook(wb, file.path(getwd(), "Documents/table1.xlsx"), overwrite = TRUE)
    

    -output

    enter image description here