Search code examples
pythonperformancenumpy

why is numpy.linalg.norm slow when called many times for small size data?


import numpy as np
from datetime import datetime
import math

def norm(l):
    s = 0
    for i in l:
        s += i**2
    return math.sqrt(s)

def foo(a, b, f):
    l = range(a)
    s = datetime.now()
    for i in range(b):
        f(l)
    e = datetime.now()
    return e-s

foo(10**4, 10**5, norm)
foo(10**4, 10**5, np.linalg.norm)
foo(10**2, 10**7, norm)
foo(10**2, 10**7, np.linalg.norm)

I got the following output:

0:00:43.156278
0:00:23.923239
0:00:44.184835
0:01:00.343875

It seems like when np.linalg.norm is called many times for small-sized data, it runs slower than my norm function.

What is the cause of that?


Solution

  • First of all: datetime.now() isn't appropriate to measure performance, it includes the wall-time and you may just pick a bad time (for your computer) when a high-priority process runs or Pythons GC kicks in, ...

    There are dedicated timing functions/modules available in Python: the built-in timeit module or %timeit in IPython/Jupyter and several other external modules (like perf, ...)

    Let's see what happens if I use these on your data:

    import numpy as np
    import math
    
    def norm(l):
        s = 0
        for i in l:
            s += i**2
        return math.sqrt(s)
    
    r1 = range(10**4)
    r2 = range(10**2)
    
    %timeit norm(r1)
    3.34 ms ± 150 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    %timeit np.linalg.norm(r1)
    1.05 ms ± 3.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    %timeit norm(r2)
    30.8 µs ± 1.53 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    %timeit np.linalg.norm(r2)
    14.2 µs ± 313 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    

    It isn't slower for short iterables it's still faster. However note that the real advantage from NumPy functions comes if you already have NumPy arrays:

    a1 = np.arange(10**4)
    a2 = np.arange(10**2)
    
    %timeit np.linalg.norm(a1)
    18.7 µs ± 539 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    %timeit np.linalg.norm(a2)
    4.03 µs ± 157 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    

    Yeah, it's quite a lot faster now. 18.7us vs. 1ms - almost 100 times faster for 10000 elements. That means most of the time of np.linalg.norm in your examples was spent in converting the range to a np.array.