I am trying to use parLapplyLB
to read excel files
file.list <- list.files(path="folder path", pattern="*.xlsx", full.names=TRUE)
test <- parLapplyLB(cl, file.list, function(x){
readxl::read_excel(x, sheet=sheet_in_file)
}
My problem is that each file in file.list has a different "sheet" name that I need to read. For e.g. in file1, I need to read sheet = "abc", in file2, need to read sheet = "xyz" etc.
I tried few things but I have not been able to achieve this, is there a way to do it using parLapplyLB
?
parLapplyLB
as parallelized variant of lapply
essentially loops over one list. What you want is a multivariate variant which allows you to loop over the 1st, 2nd, 3rd, etc. element in both of your two lists, where the single-threaded version would be Map
(with a big M). The parallelized version mcMap
lives in the parallel
package you are using as well.
We may use parallel::mcMap
.
library(parallel)
r <- mcMap(openxlsx::read.xlsx, file.list, sheet_in_file)
r
# r$`./foo/wb1.xlsx`
# X1 X2 X3
# 1 1 1 1
# 2 1 1 1
#
# $`./foo/wb2.xlsx`
# X1 X2 X3
# 1 2 2 2
# 2 2 2 2
#
# $`./foo/wb3.xlsx`
# X1 X2 X3
# 1 3 3 3
# 2 3 3 3
#
# $`./foo/wb4.xlsx`
# X1 X2 X3
# 1 4 4 4
# 2 4 4 4
(Should work similarly with readxl::read_excel
, don't have it installed on this machine.)
Also see @HenrikB's comment below for a solution using a different package.
Data:
dir.create('foo') ## create dir
## create four lists of four dataframes, write to dir as .xlsx
lapply(1:4, \(j) openxlsx::write.xlsx(lapply(1:4, \(i) data.frame(matrix(i, 2, 3))) |> setNames(LETTERS[1:4]),
file=sprintf('./foo/wb%s.xlsx', j), overwrite=TRUE)) |>invisible()
file.list <- list.files(path = "./foo", pattern="*.xlsx", full.names=TRUE)
sheet_in_file <- c('A', 'B', 'C', 'D')