Search code examples
rsystemdcraw

R parallel system call on files


I have to convert a large number of RAW images and am using the program DCRAW to do that. Since this program is only using one core I want to parallelize this in R. To call this function I use:

system("dcraw.exe -4 -T image.NEF")

This results in outputting a file called image.tiff in the same folder as the NEF file, which is totally fine. Now I tried multiple R packages to parallelize this but I only get nonsensical returns (probably caused by me). I want to run a large list (1000+ files) through this system call in r , obtained by list.files()

I could only find info on parallel programming for variables within R but not for system calls. Anybody got any ideas? Thanks!


Solution

  • It doesnt' matter if you use variables or system. Assuming you're not on Windows (which doesn't support parallel), on any decent system you can run

    parallel::mclapply(Sys.glob("*.NEF"),
      function(fn) system(paste("dcraw.exe -4 -T", shQuote(fn))),
      mc.cores=8, mc.preschedule=F)
    

    It will run 8 jobs in parallel. But then you may as well not use R and use instead

    ls *.NEF | parallel -u -j8 'dcraw.exe -4 -T {}'
    

    instead (using GNU parallel).