I am trying to use RStudio to subset files from a folder, which are all named in sequence using 4 digits (e.g. Horse0001.jpg
, Horse0002.jpg
, etc). However, I'm running into errors because I can't figure out how to make R read the file indexing that way - R instead tries to look form them as Horse1.jpg
, Horse2.jpg
, etc, and therefore tells me it can't run the command because it can't find the file Horse1.jpg
(which doesn't exist).
I understand the problem is with the start_index
portion of the code but I can't figure out how to manipulate that.
I hope the above makes sense.
My code is below:
original_dir <- path("data/horsies")
new_base_dir <- path("data/horsies2")
make_subset <- function(subset_name,
start_index, end_index) {
for (category in c("horse", "ponies")) {
file_name <- glue::glue("{category}.{ start_index:end_index }.png")
dir_create(new_base_dir / subset_name / category)
file_copy(original_dir / file_name,
new_base_dir / subset_name / category / file_name)
}
}
make_subset("train", start_index = 1, end_index = 2000)
make_subset("validation", start_index = 2001, end_index = 2200)
make_subset("test", start_index = 2201, end_index = 2500)
Thank you in advance!
This should work:
file_name <- glue::glue("{category}{ sprintf('%04d', start_index:end_index) }.png")
A more efficient implementation, however, would be:
file_name <- sprintf('%s%04d.png', category, start_index:end_index)
> microbenchmark(glue = glue::glue("{category}{ sprintf('%04d', start_index:end_index) }.png"),
+ sprintf = sprintf('%s%04d.png', category, start_index:end_index), times = 1000)
Unit: microseconds
expr min lq mean median uq max neval
glue 267.771 278.677 295.2356 284.212 291.2230 7400.090 1000
sprintf 184.623 187.247 192.3178 189.625 195.1395 456.002 1000