I have several files in one place. I merge them like this:
punkty_pgd <- read_xlsx("C:/Users/Desktop/PC- 2021042500.xlsx", sheet = "NEW")
punkty_pgd_2 <- read_xlsx("C:/Users/Desktop/PC- 2021042512.xlsx", sheet = "NEW")
punkty_pgd_3 <- read_xlsx("C:/Users/Desktop/PC- 2021042600.xlsx", sheet = "NEW")
punkty_pgd <- rbind(punkty_pgd, punkty_pgd_2, punkty_pgd_3)
There are dates in their names and I would like to limit this merging somehow, that is, only merge files that have a date range in the name: from date_start to date_end
For the "load many files and process them", a good place to start is https://stackoverflow.com/a/24376207/3358227.
Lacking that, let's address the filename/date thing.
First, assuming there are files you don't want to load (this time), then we should first get the names of the files.
files <- list.files("c:/Users/Desktop", pattern = "\\.xlsx$", full.names = TRUE)
### add a fake file without a date, for testing
files <- c(files, "Something.xlsx")
files
# [1] "C:/Users/Desktop/PC- 2021042500.xlsx"
# [2] "C:/Users/Desktop/PC- 2021042512.xlsx"
# [3] "C:/Users/Desktop/PC- 2021042600.xlsx"
# [4] "Something.xlsx"
Now we can determine what the Date
is by parsing the filenames.
files <- cbind(strcapture("\\b([0-9]{10})\\b", files, list(fn = "")), origfn = files)
files
# fn origfn
# 1 2021042500 C:/Users/Desktop/PC- 2021042500.xlsx
# 2 2021042512 C:/Users/Desktop/PC- 2021042512.xlsx
# 3 2021042600 C:/Users/Desktop/PC- 2021042600.xlsx
# 4 <NA> Something.xlsx
(strcapture
returns a data.frame
, and I'll keep it in there for convenience.)
files$date <- as.Date(files$fn, format = "%Y%m%d%H")
files
# fn origfn date
# 1 2021042500 C:/Users/Desktop/PC- 2021042500.xlsx 2021-04-25
# 2 2021042512 C:/Users/Desktop/PC- 2021042512.xlsx 2021-04-25
# 3 2021042600 C:/Users/Desktop/PC- 2021042600.xlsx 2021-04-26
# 4 <NA> Something.xlsx <NA>
Now we can filter just the files we need for a specific date range.
base R:
dates <- as.Date(c("2021-04-20", "2021-04-25"))
loadthese <- subset(files, dates[1] <= date & date <= dates[2])
loadthese
# fn origfn date
# 1 2021042500 C:/Users/Desktop/PC- 2021042500.xlsx 2021-04-25
# 2 2021042512 C:/Users/Desktop/PC- 2021042512.xlsx 2021-04-25
list_of_frames <- lapply(loadthese$origfn, readxl::read_xlsx, sheet = "NEW")
punkty_pgd <- do.call(rbind, list_of_frames)
tidyverse
dates <- as.Date(c("2021-04-20", "2021-04-25"))
punkty_pgd <- files %>%
filter(between(date, dates[1], dates[2])) %>%
pull(origfn) %>%
map_dfr(~ read_xlsx(., sheet = "NEW"))