Search code examples
rrandomparallel-processingmulticore

Parallel processing and temporary files


I'm using the mclapply function in the multicore package to do parallel processing. It seems that all child processes started produce the same names for temporary files given by the tempfile function. i.e. if I have four processors,

library(multicore)
mclapply(1:4, function(x) tempfile())

will give four exactly same filenames. Obviously I need the temporary files to be different so that the child processes don't overwrite each others' files. When using tempfile indirectly, i.e. calling some function that calls tempfile I have no control over the filename.

Is there a way around this? Do other parallel processing packages for R (e.g. foreach) have the same problem?

Update: This is no longer an issue since R 2.14.1.

CHANGES IN R VERSION 2.14.0 patched:

[...]

o tempfile() on a Unix-alike now takes the process ID into account.
  This is needed with multicore (and as part of parallel) because
  the parent and all the children share a session temporary
  directory, and they can share the C random number stream used to
  produce the uniaue part.  Further, two children can call
  tempfile() simultaneously.

Solution

  • At least for now, I chose to monkey-patch my way around this by using the following code in my .Rprofile following Daniel's advice to use PID values.

    assignInNamespace("tempfile.orig", tempfile, ns="base")
    .tempfile = function(pattern="file", tmpdir=tempdir())
        tempfile.orig(paste(pattern, Sys.getpid(), sep=""), tmpdir)
    assignInNamespace("tempfile", .tempfile, ns="base")
    

    Obviously it's not a good option for any package you'd distribute, but for a single user's need it's the best option thus far since it works in all cases.