Search code examples
rexcelcontingency

Sum up tables results from multiple sheets into one table in R


I am reading through excel file that has multiple sheets.

 file_to_read <- "./file_name.xlsx"
 
 # Get all names of sheets in the file
 sheet_names <- readxl::excel_sheets(file_to_read)
 
 # Loop through sheets
 L <- lapply(sheet_names, function(x) {
 all_cells <-
 tidyxl::xlsx_cells(file_to_read, sheets = x)
})

L here has all the sheets. Now, I need to get the data from each sheet to combine all the columns and rows into one file. To be exact, I want to sum the matching columns and rows in the data into one file.

I will put simple example to make it clear.

For example, this table in one sheet,

df1 <- data.frame(x = 1:5, y = 2:6, z = 3:7)
rownames(df1) <- LETTERS[1:5]
df1
M x y z
A 1 2 3
B 2 3 4
C 3 4 5
D 4 5 6
E 5 6 7

The second table in the next sheet,

df2 <- data.frame(x = 1:5, y = 2:6, z = 3:7, w = 8:12)
rownames(df2) <- LETTERS[3:7]
df2
M x y z  w
C 1 2 3  8
D 2 3 4  9
E 3 4 5 10
F 4 5 6 11
G 5 6 7 12  

My goal is to combine (sum) the matched records in all 100 tables from one excel file to get one big tables that has the total sum of each value.

The final table should be like this:

M x y  z   w
A 1 2  3   0
B 2 3  4   0
C 4 6  8   8
D 6 8  10  9
E 8 10 12 10
F 4 5  6  11
G 5 6  7  12

Is there a way to achieve this in R? I am not an expert in R, but I wish if I could know how to read all sheets and do the sum Then save the output to a file.

Thank you


Solution

  • One approach that will work is these steps:

    • read each sheet into a list
    • convert each sheet into a long format
    • bind into a single data frame
    • sum and group by over that long data frame
    • cast back to tabular format

    That should work for N sheets with any combination of row and column headers in those sheets. E.g.

    file <- "D:\\Book1.xlsx"
    sheet_names <- readxl::excel_sheets(file)
    sheet_data <- lapply(sheet_names, function(sheet_name) {
      readxl::read_xlsx(path = file, sheet = sheet_name)
    })
    
    # use pivot_longer on each sheet to make long data
    long_sheet_data <- lapply(sheet_data, function(data) {
      long <- tidyr::pivot_longer(
        data = data,
        cols = !M,
        names_to = "col",
        values_to = "val"
      )
    })
    
    # combine into a single tibble
    long_data = dplyr::bind_rows(long_sheet_data)
    
    # sum up matching pairs of `M` and `col`
    summarised <- long_data %>%
      group_by(M, col) %>%
      dplyr::summarise(agg = sum(val))
      
    # convert to a tabular format
    tabular <- summarised %>%
      tidyr::pivot_wider(
        names_from = col,
        values_from = agg,
        values_fill = 0
      )
    
    tabular
    

    I get this output with a spreadsheet using your initial inputs:

    > tabular
    # A tibble: 7 x 5
    # Groups:   M [7]
      M         x     y     z     w
      <chr> <dbl> <dbl> <dbl> <dbl>
    1 A         1     2     3     0
    2 B         2     3     4     0
    3 C         4     6     8     8
    4 D         6     8    10     9
    5 E         8    10    12    10
    6 F         4     5     6    11
    7 G         5     6     7    12