Search code examples
rfileunlink

what is the fastest way to delete files using R


I have a folder in which I have to delete approximately 4,000 .rds files on a daily basis. The files are no more than a few kilobytes (max size: 73 kb), but every time I try to delete them via R it can take a while to do (same if I manually delete them). I would like to know if there are alternative methods to delete them much quicker?

What I do to delete files:

# ***********************************************************************
# METHOD # 1 :
# reads all the .rds files from folder
files2 <- list.files(paste("/Volumes/share/ZZZ/GOOGLE1/"))

# I use lapply along with file.remove()
TR <- lapply(as.list(files2),function(x) file.remove(paste0("/Volumes/share/ZZZ/GOOGLE1/",x,"")))

# ***********************************************************************
# METHOD #2 : 
do.call(unlink,list(list.files("/Volumes/share/ZZZ/GOOGLE1/",full.names=TRUE)))
# ***********************************************************************
# METHOD # 3 :
unlink("/Volumes/share/ZZZ/GOOGLE1/", recursive=TRUE, force=TRUE)

I tested all 3 methods by deleting 100 files for each method

RESULTS:

METHOD #1 :
  user  system elapsed 
  0.014   0.064  44.133 

METHOD #2 : 
      user  system elapsed 
      0.010   0.047  36.447 


METHOD #3 :
   user  system elapsed 
   0.009   0.057  43.400 


sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.5 (El Capitan)

Solution

  • unlink() accepts wildcards, so you can do the following, which seems quite fast on my system:

    system.time({ unlink('*.rds'); }); ## deleted 4000 ~65KB files
    ##    user  system elapsed
    ##   0.140   0.922   1.151
    

    Note that @Thomas's suggestion of using system() with wait=F is a good idea, but has several drawbacks: (1) it is platform-dependent, (2) you will not be able to check the return code of the removal command, since it is run asynchronously, and (3) it may introduce a race condition; for example, if subsequent code quickly writes a new *.rds file, then it could end up being deleted by the asynchronous removal command.