Search code examples
rsubset

Subsetting files in R - Read file name indexing as sequences of 4 digits, e.g. 0001 to 4000, rather than 1 to 4000)


I am trying to use RStudio to subset files from a folder, which are all named in sequence using 4 digits (e.g. Horse0001.jpg, Horse0002.jpg, etc). However, I'm running into errors because I can't figure out how to make R read the file indexing that way - R instead tries to look form them as Horse1.jpg, Horse2.jpg, etc, and therefore tells me it can't run the command because it can't find the file Horse1.jpg (which doesn't exist).

I understand the problem is with the start_index portion of the code but I can't figure out how to manipulate that.

I hope the above makes sense.

My code is below:

original_dir <- path("data/horsies")
new_base_dir <- path("data/horsies2")
make_subset <- function(subset_name,
                        start_index, end_index) {
  for (category in c("horse", "ponies")) {
    file_name <- glue::glue("{category}.{ start_index:end_index }.png")
    dir_create(new_base_dir / subset_name / category)
    file_copy(original_dir / file_name,
              new_base_dir / subset_name / category / file_name)
  }
}
make_subset("train", start_index = 1, end_index = 2000)
make_subset("validation", start_index = 2001, end_index = 2200)
make_subset("test", start_index = 2201, end_index = 2500)

Thank you in advance!


Solution

  • This should work:

    file_name <- glue::glue("{category}{ sprintf('%04d', start_index:end_index) }.png")
    

    A more efficient implementation, however, would be:

    file_name <- sprintf('%s%04d.png', category, start_index:end_index) 
    
    > microbenchmark(glue = glue::glue("{category}{ sprintf('%04d', start_index:end_index) }.png"),
    + sprintf = sprintf('%s%04d.png', category, start_index:end_index), times = 1000)
    Unit: microseconds
        expr     min      lq     mean  median       uq      max neval
        glue 267.771 278.677 295.2356 284.212 291.2230 7400.090  1000
     sprintf 184.623 187.247 192.3178 189.625 195.1395  456.002  1000