I am very new to r and I have no experience with regular expressions and any help would be really appreciated.
I am reading in a dir
and I am trying to find files with the number "22953" and then I want to read the newest file containing this. The date is also written in the files' name.
Files in the directory:
inv_22953_20190828023258_112140.csv
inv_22953_20190721171018_464152.csv
inv_8979_20190828024558_112140.csv
The problem that I have here is that I can't really depend on the place of the string to get the date because as you can see some files might have fewer characters that is why maybe a solution would be to locate the date between the 2nd and 3rd.
filepath <- "T:/Pricing/Workstreams/Business Management/EU/01_Operations/02_Carveouts/05_ImplementationTest/"
list.files(filepath)[which.max(suppressWarnings(ymd_hm(substr(list.files(filepath, pattern="_22953"),11,22))))]```
library(lubridate)
# First find the files with 22953 inside
myFiles <- grep("22953", list.files(filepath), value = T)
# Then, isolate the date and which file has the newest (maximum) date:
regex <- "^.*_.*_([0-9]{4})([0-9]{2})([0-9]{2}).*\\.csv$"
myFiles[which(as_date(sub(regex, "\\1-\\2-\\3", myFiles)) == max(as_date(sub(regex, "\\1-\\2-\\3", myFiles))))]
Explanation of the regular expression