Search code examples
rcopy-pastefile-manipulation

Copying specific files from multiple sub-directories into a single folder in R


Assuming I have 3 folders with a large number of files in each, I want to select only a few files from each sub-directory and paste only those files into a new folder. Let's call the 3 folders:

  • desktop/dir/sub_11s_gi01_ab
  • desktop/dir/sub_11f_gi01_b
  • desktop/dir/sub_12s_gi02_ms

The files that need to be copied have the extension ".wang.tax.sum"

All of the other files cannot be copied and then deleted because it would take days.

From other questions, I can combine all the files into a list and copy all of them but I don't know how to copy only the files that end with .wang.tax.sum
I can use the grep function to get a list of the files that I want to transfer, but not sure how to copy that list of files in their sub-directories to a new folder. Here's what I have so far, that does not work.

parent.folder <- "C:/Desktop/dir"
my_dirs <- list.files(path = parent.folder, full.names = T, recursive = T, include.dirs = T)

##this does not work##
a <- grep("wang.tax.sum",my_dirs)
my_dirs <- my_dirs[a]

files <- sapply(my_dirs, list.files, full.names = T)

dir.create("taxsum", recursive = T)

for(file in files) {
  file.copy(file, "taxsum")
}

I know that the grep is not working here, but I'm not sure how to create a function that only selects the files I want and copy them to a single folder. I have roughly 50 sub-folders in total each having about 1gb of data, so again, copying all the data and then deleting what I don't want is not an option. Any help is greatly appreciated


Solution

  • parent.folder <- "C:/Desktop/dir"
    files <- list.files(path = parent.folder, full.names = T, recursive = T, include.dirs = T)
    

    After this you need to select the relevant files:

    files <- files[grep("wang\\.tax\\.sum", files)]
    

    (Notice double-escapes before dots: \\. - dot has a special meaning for grep.)

    Or you could do this with pattern argument to list.files in one step:

    files <- list.files(path = parent.folder, full.names = T, recursive = T, include.dirs = T, pattern = "wang\\.tax\\.sum")
    

    Creating new dir:

    dir.create("taxsum", recursive = T)
    

    Now you need to create new filenames:

    newnames <- paste0("taxsum/", gsub("/|:", "_", files))
    # replace "special" characters with underscore
    # so that your file names will be different and contain the 
    # original path
    
    # alternatively, if you know that file names will be different:
    newnames <- paste0("taxsum/", basename(files))
    

    And now you can use mapply to copy (the same can be done with for with a little extra effort):

    mapply(file.copy, from=files, to=newnames)