Subsetting files in R - Read file name indexing as sequences of 4 digits, e.g. 0001 to 4000, rather than 1 to 4000)

I am trying to use RStudio to subset files from a folder, which are all named in sequence using 4 digits (e.g. Horse0001.jpg, Horse0002.jpg, etc). However, I'm running into errors because I can't figure out how to make R read the file indexing that way - R instead tries to look form them as Horse1.jpg, Horse2.jpg, etc, and therefore tells me it can't run the command because it can't find the file Horse1.jpg (which doesn't exist).

I understand the problem is with the start_index portion of the code but I can't figure out how to manipulate that.

I hope the above makes sense.

My code is below:

original_dir <- path("data/horsies")
new_base_dir <- path("data/horsies2")
make_subset <- function(subset_name,
                        start_index, end_index) {
  for (category in c("horse", "ponies")) {
    file_name <- glue::glue("{category}.{ start_index:end_index }.png")
    dir_create(new_base_dir / subset_name / category)
    file_copy(original_dir / file_name,
              new_base_dir / subset_name / category / file_name)
  }
}
make_subset("train", start_index = 1, end_index = 2000)
make_subset("validation", start_index = 2001, end_index = 2200)
make_subset("test", start_index = 2201, end_index = 2500)

Thank you in advance!

Solution

This should work:

file_name <- glue::glue("{category}{ sprintf('%04d', start_index:end_index) }.png")

A more efficient implementation, however, would be:

file_name <- sprintf('%s%04d.png', category, start_index:end_index)

> microbenchmark(glue = glue::glue("{category}{ sprintf('%04d', start_index:end_index) }.png"),
+ sprintf = sprintf('%s%04d.png', category, start_index:end_index), times = 1000)
Unit: microseconds
    expr     min      lq     mean  median       uq      max neval
    glue 267.771 278.677 295.2356 284.212 291.2230 7400.090  1000
 sprintf 184.623 187.247 192.3178 189.625 195.1395  456.002  1000