Search code examples
rparallel-processingmclapply

R - get worker name when running in parallel


I am running a function in parallel. In order to get progress updates on the state of the work, I would like one but only one worker to report periodically on its progress. My natural thought for how to do this would be to have the function that the workers execute check the name of the worker, and only give the status update if the name matches a particular value. But, I can't find a reliable way to determine this in advance. In Julia for instance, there is a simple myid() function that will give a worker's ID (i.e. 1, 2, etc.). I am looking for something equivalent in R. The best that I've found so far is to have each worker call Sys.getpid(). But, I don't know a reliable way to write my script so that I'll know in advance what one of the pids that gets assigned to a worker would be. The basic functionality script that I'm looking to write looks like the below, with the exception that I'm looking for R's equivalent to the myid() function:

library(parallel)

Test_Fun = function(a){
    for (idx in 1:10){
        Sys.sleep(1)
        if (myid() == 1){
            print(idx)
        }
    }
}

mclapply(1:4, Test_Fun, mc.cores = 4)

Solution

  • The parallel package doesn't provide a worker ID function as of R 3.3.2. There also isn't a mechanism provided to initialize the workers before they start to execute tasks.

    I suggest that you pass an additional task ID argument to the worker function by using the mcmapply function. If the number of tasks is equal to the number of workers, the task ID can be used as a worker ID. For example:

    library(parallel)
    Test_Fun = function(a, taskid){
        for (idx in 1:10){
            Sys.sleep(1)
            if (taskid == 1){
                print(idx)
            }
        }
    }
    mcmapply(Test_Fun, 1:4, 1:4, mc.cores = 4)
    

    But if there are more tasks than workers, you'll only see the progress messages for the first task. You can work around that by initializing each of the workers when they execute their first task:

    WORKERID <- NA  # indicates worker is uninitialized
    Test_Fun = function(a, taskid){
        if (is.na(WORKERID)) WORKERID <<- taskid
        for (idx in 1:10){
            Sys.sleep(1)
            if (WORKERID == 1){
                print(idx)
            }
        }
    }
    cores <- 4
    mcmapply(Test_Fun, 1:8, 1:cores, mc.cores = cores)
    

    Note that this assumes that mc.preschedule is TRUE, which is the default. If mc.preschedule is FALSE and the number of tasks is greater than the number of workers, the situation is much more dynamic because each task is executed by a different worker process and the workers don't all execute concurrently.