Search code examples
pythonalgorithmtime-complexitymatrix-multiplication

Does matrix multiplication time-complexity only apply to large N?


The time complexity of (square, naive) matrix multiplication is O(N3), e.g. this answer

I can run a quick script

import numpy as np 
import time 
import matplotlib.pyplot as plt 

def time_matmul_n(n):

    #Create two arrays of dimension n
    arr1 = np.random.rand(n, n) 
    arr2 = np.random.rand(n, n) 


    t0=time.process_time_ns() #start the clock
    np.matmul(arr1,arr2)
    t1=time.process_time_ns()

    return t1-t0 

N = 100

n_values = range(N)
t_values = np.zeros_like(n_values)

for i,n in enumerate(n_values):
     t_values[i] = time_matmul_n(n)

#Plot the results
plt.plot(n_values,t_values)

This gives me something like

enter image description here

Now I appreciate there is a lot of noise here, and the run time will vary depending on the processes that are going on on my laptop, but this doesn't look like it scales anything like O(N3).

I am clearly not understanding something. Can anyone explain why the graph doesn't look O(N3) -like? Does this only matter when N gets large?


Solution

  • Can anyone explain why the graph doesn't look O(N3). Does this only matter when N gets large?

    There are a few reasons:

    • Time complexity is about asymptotic behaviour. By definition this will only become apparent for values of N that are larger than some minimum value to be determined.

    • For sizes that are too small, the process is trivial and only takes a millisecond or less, so that your time measurements are more determined by other overhead.

    • Related: increasing the input size with steps of 1 limits the range of N for which you visualise the result. You can improve on this by taking larger (but constant) increments for N.

    • As random inputs may even give different results for the same value of N, it is expected you get erratic results. To improve on that, repeat the operation for several random inputs with the same N and then take the average of the time it took.

    Here is how your script could be modified to generate more telling results:

    import numpy as np 
    import time 
    import matplotlib.pyplot as plt 
    
    def time_matmul_n(n):
        acc = 0
        for i in range(10):  # Multiple times to take average
            arr1 = np.random.rand(n, n) 
            arr2 = np.random.rand(n, n) 
            t0 = time.process_time_ns()
            np.matmul(arr1, arr2)
            t1 = time.process_time_ns()
            acc += t1-t0 
        return acc / 10
    
    # Larger sizes, increase with steps
    N = 1500
    step = 50
    
    n_values = range(step, N, step)
    t_values = np.zeros_like(n_values)
    
    # This takes a while to run
    for i,n in enumerate(n_values):
        t_values[i] = time_matmul_n(n)
        print(i, t_values[i])  # Give some indication of progress
    
    # Plot the results
    plt.plot(n_values,t_values)
    

    Now you'll get a graph that looks something like this:

    enter image description here