Comparing R and Python Vectorization and Optimization

In the R language, optimization can be achieved by using purrr::map() or furrr::future_map() functions. However, I am not sure how does optimization works for np.array() methods. Indeed, I would like to understand how does Python and R scales out to parallel processing [1, 2] in terms of complexity and performance.

Thus, the following questions arise:

How does the optimization of np.array() in Python works comparing to purrr::map() and furrr::future_map() functions in the R language?

By doing a simple tictoc test on purrr/furrr, I can observe that we have a big win from vectorization in both cases. Nonetheless, I can also notice that the results seem to show that the R language is just fundamentally faster.

Python

import time

a = np.random.rand(1000000)
b = np.random.rand(1000000)

tic = time.time()
c = np.dot(a,b)
toc = time.time()

print ("Vectorized version:" + str(1000*(toc-tic)) +"ms")

c = 0
tic = time.time()
for i in range(1000000):
  c += a[i]*b[i]
toc = time.time()

print("For loop:" + str(1000*(toc-tic)) +"ms")

Output

Vectorized version: 54.151296615600586ms

For loop: 676.0082244873047ms

R

a <- runif(1000000,0,1)
b <- runif(1000000,0,1)

c = 0
tictoc::tic()
c = sum(a * b)
tictoc::toc()

c = 0
tictoc::tic()
  for (i in 1:length(a)) {
    c = a[i]*b[i] + c
  }
tictoc::toc()

Output

Vectorized version: 0.013 sec elapsed

For loop: 0.065 sec elapsed

References

[1] Ross Ihaka & Robert Gentleman (1996) R: A Language for Data Analysis and Graphics, Journal of Computational and Graphical Statistics, 5:3, 299-314, DOI: 10.1080/10618600.1996.10474713

[2] S. van der Walt, S. C. Colbert and G. Varoquaux, "The NumPy Array: A Structure for Efficient Numerical Computation," in Computing in Science & Engineering, vol. 13, no. 2, pp. 22-30, March-April 2011, doi: 10.1109/MCSE.2011.37

Solution

I believe numpy wraps some of its "primitive" objects in wrapper classes which are, themselves, Python (eg. this one). When looking at the R mirror source, I conversely find an array class that's basically native code (aka C). That extra indirection layer alone could explain the difference in speed, I guess.