Search code examples
rmacosparallel-processingfirewallr-future

R parallel makeCluster() hangs infinitely on Mac


I came across some problem when I was trying to use parallel package in R on my Mac.

Here is how the parallel package works normally.

cl = makeCluster(2) # Using 2-core parallel as an example
# Your parallel code
stopCluster(cl)

When I ran this code, the cl = makeCluster(2) hangs infinitely. I was trying to solve it but failed. I also referred to some other posts. Several potential reasons includes not enough memory, installation error, etc. They do not seems to be the problem here, as I restarted sessions, reinstalled R, but the problem remained.

I guess the problem is about the permission when R tried to connect to cores. Here is what I found out. I used future package to see the specific process of connecting to cores. Attached are the code and its return.

cl <- future::makeClusterPSOCK(2, verbose = TRUE)

Workers: [n = 2] ‘localhost’, ‘localhost’ Base port: 11303 Creating node 1 of 2 ... - setting up node Starting worker #1 on ‘localhost’: '/Library/Frameworks/R.framework/Resources/bin/Rscript' --default packages=datasets,utils,grDevices,graphics,stats,methods -e 'parallel:::.slaveRSOCK()' MASTER=localhost PORT=11303 OUT=/dev/null TIMEOUT=2592000 XDR=TRUE
Waiting for worker #1 on ‘localhost’ to connect back

The problem is the localhost never connects back ...

The following my the session info. I hope this helps.

R version 3.5.1 (2018-07-02).
Platform: x86_64-apple-darwin15.6.0 (64-bit).
Running under: macOS High Sierra 10.13.6.

Matrix products: default.
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib.
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_3.5.1 parallel_3.5.1 tools_3.5.1 listenv_0.7.0 codetools_0.2-15 digest_0.6.16
[7] globals_0.12.2 future_1.9.0

It's interesting that the same code works on my old Mac machine (same OS but the hardware is older). I have no idea what is happening here. Any help is appreciated! Thanks!


Solution

  • Several potential reasons includes not enough memory, installation error, etc. They do not seems to be the problem here, as I restarted sessions, reinstalled R, but the problem remained.

    Correct, those type of problems should not be involved here. The calls you've shown use basic built-in functionalities of R (mostly from the 'parallel' package) and there's very little memory usage involved.

    I guess the problem is about the permission when R tried to connect to cores. [...]

    Both parallel:makeCluster(2) and future::makeClusterPSOCK(2) launches workers (using the parallel:::.slaveRSOCK()) that are independent R sessions that run in the background. The master session and these workers communicate via sockets. So, yes, it could be that you have firewall issues preventing R from opening those ports. (I don't know enough macOS to troubleshoot that)

    By setting outfile = NULL, you will also get information on what happens on the workers' end. Here is what it should look like when it works:

    > cl <- future::makeClusterPSOCK(1, outfile = NULL, verbose = TRUE)
    Workers: [n = 1] ‘localhost’
    Base port: 11306
    Creating node 1 of 1 ...
    - setting up node
    Starting worker #1 on ‘localhost’: '/usr/lib/R/bin/Rscript' --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'parallel:::.slaveRSOCK()' MASTER=localhost PORT=11306 OUT= TIMEOUT=2592000 XDR=TRUE
    Waiting for worker #1 on ‘localhost’ to connect back
    starting worker pid=7608 on localhost:11306 at 14:46:57.827
    Connection with worker #1 on ‘localhost’ established
    - assigning connection UUID
    - collecting session information
    Creating node 1 of 1 ... done
    

    PS. You only need one worker to troubleshoot this.