I am reading through excel file that has multiple sheets.
file_to_read <- "./file_name.xlsx"
# Get all names of sheets in the file
sheet_names <- readxl::excel_sheets(file_to_read)
# Loop through sheets
L <- lapply(sheet_names, function(x) {
all_cells <-
tidyxl::xlsx_cells(file_to_read, sheets = x)
})
L here has all the sheets. Now, I need to get the data from each sheet to combine all the columns and rows into one file. To be exact, I want to sum the matching columns and rows in the data into one file.
I will put simple example to make it clear.
For example, this table in one sheet,
df1 <- data.frame(x = 1:5, y = 2:6, z = 3:7)
rownames(df1) <- LETTERS[1:5]
df1
M x y z
A 1 2 3
B 2 3 4
C 3 4 5
D 4 5 6
E 5 6 7
The second table in the next sheet,
df2 <- data.frame(x = 1:5, y = 2:6, z = 3:7, w = 8:12)
rownames(df2) <- LETTERS[3:7]
df2
M x y z w
C 1 2 3 8
D 2 3 4 9
E 3 4 5 10
F 4 5 6 11
G 5 6 7 12
My goal is to combine (sum) the matched records in all 100 tables from one excel file to get one big tables that has the total sum of each value.
The final table should be like this:
M x y z w
A 1 2 3 0
B 2 3 4 0
C 4 6 8 8
D 6 8 10 9
E 8 10 12 10
F 4 5 6 11
G 5 6 7 12
Is there a way to achieve this in R? I am not an expert in R, but I wish if I could know how to read all sheets and do the sum Then save the output to a file.
Thank you
One approach that will work is these steps:
That should work for N sheets with any combination of row and column headers in those sheets. E.g.
file <- "D:\\Book1.xlsx"
sheet_names <- readxl::excel_sheets(file)
sheet_data <- lapply(sheet_names, function(sheet_name) {
readxl::read_xlsx(path = file, sheet = sheet_name)
})
# use pivot_longer on each sheet to make long data
long_sheet_data <- lapply(sheet_data, function(data) {
long <- tidyr::pivot_longer(
data = data,
cols = !M,
names_to = "col",
values_to = "val"
)
})
# combine into a single tibble
long_data = dplyr::bind_rows(long_sheet_data)
# sum up matching pairs of `M` and `col`
summarised <- long_data %>%
group_by(M, col) %>%
dplyr::summarise(agg = sum(val))
# convert to a tabular format
tabular <- summarised %>%
tidyr::pivot_wider(
names_from = col,
values_from = agg,
values_fill = 0
)
tabular
I get this output with a spreadsheet using your initial inputs:
> tabular
# A tibble: 7 x 5
# Groups: M [7]
M x y z w
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 1 2 3 0
2 B 2 3 4 0
3 C 4 6 8 8
4 D 6 8 10 9
5 E 8 10 12 10
6 F 4 5 6 11
7 G 5 6 7 12