In the R language, optimization can be achieved by using purrr::map()
or furrr::future_map()
functions. However, I am not sure how does optimization works for np.array()
methods. Indeed, I would like to understand how does Python and R scales out to parallel processing [1, 2] in terms of complexity and performance.
Thus, the following questions arise:
How does the optimization of np.array()
in Python works comparing to purrr::map()
and furrr::future_map()
functions in the R language?
By doing a simple tictoc
test on purrr
/furrr
, I can observe that we have a big win from vectorization in both cases. Nonetheless, I can also notice that the results seem to show that the R language is just fundamentally faster.
import time
a = np.random.rand(1000000)
b = np.random.rand(1000000)
tic = time.time()
c = np.dot(a,b)
toc = time.time()
print ("Vectorized version:" + str(1000*(toc-tic)) +"ms")
c = 0
tic = time.time()
for i in range(1000000):
c += a[i]*b[i]
toc = time.time()
print("For loop:" + str(1000*(toc-tic)) +"ms")
Vectorized version: 54.151296615600586ms
For loop: 676.0082244873047ms
a <- runif(1000000,0,1)
b <- runif(1000000,0,1)
c = 0
tictoc::tic()
c = sum(a * b)
tictoc::toc()
c = 0
tictoc::tic()
for (i in 1:length(a)) {
c = a[i]*b[i] + c
}
tictoc::toc()
Vectorized version: 0.013 sec elapsed
For loop: 0.065 sec elapsed
[1] Ross Ihaka & Robert Gentleman (1996) R: A Language for Data Analysis and Graphics, Journal of Computational and Graphical Statistics, 5:3, 299-314, DOI: 10.1080/10618600.1996.10474713
[2] S. van der Walt, S. C. Colbert and G. Varoquaux, "The NumPy Array: A Structure for Efficient Numerical Computation," in Computing in Science & Engineering, vol. 13, no. 2, pp. 22-30, March-April 2011, doi: 10.1109/MCSE.2011.37
I believe numpy wraps some of its "primitive" objects in wrapper classes which are, themselves, Python (eg. this one). When looking at the R mirror source, I conversely find an array class that's basically native code (aka C). That extra indirection layer alone could explain the difference in speed, I guess.