Search code examples
rregexdirectorystrptime

Finding two directories (which are in ten min bins) based on a time. A diabolical directory disaster


I have looked all round and can't find a working solution. A bit of background:

I am using R to find raw images based on a validated image name (all this bit works). The issue is there are at least 30 date directories with each of these having a large number of time directories, these are divided up into 10 min bins. Looking in all the bins or just the parent directory is asking a bit too much computationally. An example format of the bin would be

 R_Experiments\RawImageFinder\Raw\2016-10-08\1536
 R_Experiments\RawImageFinder\Raw\2016-10-08\1546

It is important to note that the bins are not consistent with their starting minutes; it can vary and here in lies the problem.

I know what time the image was taken from the file name using the following bit of code

SingleImage <- Pia1.2016-10-08.1103+N2353_hc.tif
TimeDir <- sub('.*?\\.\\d{4}-\\d{2}-\\d{2}\\.(\\d{2})(\\d{2}).*', '\\1:\\2', SingleImage)
TimeDir <- sub(':','', TimeDir)
#
> print(TimeDir)
[1] "1103"

So the image could belong in any of the following bins:

 \1053,\1054,\1055,..you get the idea...,\1112,\1113

it just depends when the bin was started. So I want the "finder" code to look in all possible bins that are within tin mins either side (as per the example above), obviously some of them will not exist. I thought about doing:

TimeDir1 <- as.numeric(TimeDir)+1
TimeDir2 <- as.numeric(TimeDir)+2

but the issue arises if we get to 59 mins, because there is no such thing as 61 mins in the hour (haha).

I then use the following to tell which directories to search, although I am a bit stuck also on how to tell it to look in multiple directories.

  Directorytosearch <- ParentDirectory
 #this has the \ in it, same for time, it works
  Directorytosearch <- sub('$',paste(DateDir), Directorytosearch)
  Directorytoserach <- sub('$',paste(TimeDir), Directorytoserach)


  IMAGEtocopy <- list.files(
      path = c(Directorytosearch),
      recursive = TRUE,
      include.dirs = FALSE,
      full.names = FALSE,
      pattern = SingleImagePattern)

Any help really would be great! Could be using the strptime function? Many thanks

Jim

Update for @Nya

test <- strptime("1546", format = "%H%M")
dirs[select.image.dir(test, dirs.time)]
> dirs[select.image.dir(test, dirs.time)]
[1] "test/1546"

Solution

  • To list directories, you are looking for the list.dirs() function. Let's assume that the following example was obtained from such a search through all the directories.

    # directories possibly obtained with list.dirs
    dirs <- c("test/1536", "test/1546", "test/1556", "test/1606")
    

    A good practice then would be to extract both the date and time components from the directories and image file names. Here, I will only use the time since that was the original request.

    # convert times
    dirs.time <- sub(".*/(\\d+)$", "\\1", dirs)
    dirs.time <- strptime(dirs.time, format="%H%M")
    
    # test data, in your case from image file names
    test <- strptime(c("1538", "1559", "1502"), format="%H%M")
    

    The function that will select the desired directories by comparing if the time from the image file is within the 10 minutes interval up and down the time of the directory. It will then provide the indices, where the image could be located.

    select.image.dir <- function(i, dt){
        res <- NULL
        # adding and substracting 10 minutes converted to seconds
        ik <- c(i - 600, i + 600)
        condition <- c(ik[1] <= dt & ik[2] >= dt)
        if(any(condition)){
            res <- which(condition)
        } else { res <- NA }
        res
    }    
    

    Note that the updated function accepts a single image file time to test in each round. The indices can then be used to extract the path to the image directory. The last time is outside the range of the directories and thus the function returns NA.

    dirs[select.image.dir(test[1], dirs.time)]
    # [1] "test/1536" "test/1546"
    dirs[select.image.dir(test[2], dirs.time)]
    # [1] "test/1556" "test/1606"
    dirs[select.image.dir(test[3], dirs.time)]
    # [1] NA NA NA NA