I am using python3 and in it, I use the rpy2
package to access R, pass in R code where my heavy computation is done, and get results back to python. In particular, I am using the lfe
function of R (documentation here).
I would like to know which option is better: doing the parallelization in python or in R. Does it matter? Why should we suspect that one is more efficient than the other? Thanks.
At the exception of multi-threading (which is not a great way to parallelize Python code anyway because of the GIL), and this because R cannot handle concurrency, any other way to either parallelize Python tasks (you'll find reports of people using rpy2
with pyspark
and multiprocessing
) or R (there are R packages for parallelization) will work.