Search code examples
numpyintel-mklatlas

Why is hyped Intel MKL Numpy build slower than ATLAS build on my PC?


I "dual boot" Ubuntu 11.04, Ubuntu 12.04 and Windows XP SP3 all updated to date. PC is rather old Intel Celeron D CPU 3.06GHz with 2GB RAM

In Ubuntu 11.04 I have Numpy compiled with ATLAS (ATLAS compiled from source)
In Ubuntu 12.04 I have Numpy build with latest available MKL, icc, ifort
In Windows XP I have Numpy with MKL (from kindly provided Python packages by Christoph Gohlke)
More details here: http://pastebin.com/raw.php?i=wxuFbyVg

I tried simple:
%timeit np.dot(np.ones((1000,1000)), np.ones((1000,1000)))

and got this results:

Ubuntu ATLAS: 1 loops, best of 3: 457 ms per loop
Windows MKL:  1 loops, best of 3: 680 ms per loop
Ubuntu MKL:   1 loops, best of 3: 1.04 s per loop

I thought above is bad example and I searched for one of many comparisons available, i.e. first Google hit: http://dpinte.wordpress.com/2010/01/15/numpy-performance-improvement-with-the-mkl/

I tested same functions:

%timeit test_eigenvalue()
Ubuntu Atlas: 1 loops, best of 3: 6.38 s per loop
Windows MKL:  1 loops, best of 3: 2.22 s per loop
Ubuntu MKL:   1 loops, best of 3: 3.58 s per loop

%timeit test_svd()
Ubuntu Atlas: 1 loops, best of 3: 2.13 s per loop
Windows MKL:  1 loops, best of 3: 2.06 s per loop
Ubuntu MKL:   1 loops, best of 3: 3.09 s per loop

%timeit test_inv()
Ubuntu Atlas: 1 loops, best of 3: 964 ms per loop
Windows MKL:  1 loops, best of 3: 1.02 s per loop
Ubuntu MKL:   1 loops, best of 3: 1.59 s per loop

%timeit test_det()
Ubuntu Atlas: 1 loops, best of 3: 308 ms per loop
Windows MKL:  1 loops, best of 3: 322 ms per loop
Ubuntu MKL:   1 loops, best of 3: 491 ms per loop

%timeit test_dot()
Ubuntu Atlas: 1 loops, best of 3: 1.5 s per loop        
Windows MKL:  1 loops, best of 3: 1.77 s per loop
Ubuntu MKL:   1 loops, best of 3: 2.77 s per loop

So ATLAS compiled Numpy has best results for some reason.
Does anyone know what could be the problem?


Solution

  • Intel® MKL is designed and optimized primarily for server and high performance desktop and mobile processors. Celeron D was a relatively low performance processor so MKL was never optimized for it. For example, if you check the SVD performance on a recent Intel Core i7 desktop, MKL-enabled NumPy can run as much as 80% faster than ATLAS-enabled NumPy. See here: http://software.intel.com/en-us/articles/numpy-scipy-with-mkl/

    By the way, to get faster responses to MKL related questions please join the Intel MKL forum: http://software.intel.com/en-us/forums/intel-math-kernel-library/