Search code examples
pythonrparallel-processingrpy2joblib

What is the most efficient way to parallelize python code that uses rpy2?


I am using python3 and in it, I use the rpy2package to access R, pass in R code where my heavy computation is done, and get results back to python. In particular, I am using the lfe function of R (documentation here).

I would like to know which option is better: doing the parallelization in python or in R. Does it matter? Why should we suspect that one is more efficient than the other? Thanks.


Solution

  • At the exception of multi-threading (which is not a great way to parallelize Python code anyway because of the GIL), and this because R cannot handle concurrency, any other way to either parallelize Python tasks (you'll find reports of people using rpy2 with pyspark and multiprocessing) or R (there are R packages for parallelization) will work.