Why is numba taking longer time to execute numpy calculations than executing normal python code?

parallel.py is a python file that is using numba and numpy to calculate the sum of diagonals of two matrices. The main intention here is to find the speed of execution using numba. parallel.py takes around 0.55 seconds to finish the execution while the same code in another file(sequencial.py), written in pure python takes 0.00 seconds to finish the solve the same problem, which is so ironic. Am not sure if am making good use of numba, can someone please suggest what I need to do to achieve my objective.

parallel.py from numba import jit, njit import numpy as np import time

@jit(nopython=True)
def create_matrix(row, col):
    arr = np.zeros((row, col))
    for i in range(row):
        for j in range(1, col + 1):
            arr[i, j - 1] = j + (col * i)
    return arr


print("FIND THE SUM OF PRIMARY DIAGONALS OF ANY TWO MATRICES: ")

start = time.perf_counter()

# calculate the sum of primary diagonals of matrix1
m1 = create_matrix(4, 4)  # you can adjust the size of the matrix by changing the row and column in brackets
print(f"Matrix 1 : {m1}")
print(f"Matrix 1 diagonal: {np.diagonal(m1)}")
print(f"Matrix 1 sum of primary diagonal is : {np.trace(m1)}")
mat1_sum = np.trace(m1)

# calculate the sum of primary diagonals of matrix2
m2 = create_matrix(4, 4)  # you can adjust the size of the matrix by changing the row and column in brackets
print(f"Matrix 2 : {m2}")
print(f"Matrix 2 diagonal : {np.diagonal(m2)}")
print(f"Matrix 2 Sum of diagonal is : {np.trace(m2)}")
mat2_sum = np.trace(m2, dtype='i')

sum_of_two_diagonals = mat1_sum + mat2_sum
print(f"THE SUM IS :  {sum_of_two_diagonals}")

finish = time.perf_counter()
print(f"Finished in {round(finish - start, 2)} seconds(s)")

sequencial.py

import numpy as np
import time

def create_matrix(row, col):
    arr = np.zeros((row, col))
    for i in range(row):
        for j in range(1, col + 1):
            arr[i, j - 1] = j + (col * i)
    return arr

print("FIND THE SUM OF PRIMARY DIAGONALS OF ANY TWO MATRICES: ")

start =  time.perf_counter()

# calculate the sum of primary diagonals of matrix1
mat_1 = create_matrix(4, 4) # you can adjust the size of the matrix by changing the row and column in brackets
print(f"Matrix 1 : {mat_1}")
mat1_sum_of_primary_diagonal = 0
for i in range(len(mat_1)):
    for j in range(len(mat_1[i])):
        if i == j:
             print(mat_1[i][j])
             mat1_sum_of_primary_diagonal = mat1_sum_of_primary_diagonal + mat_1[i][j]

print(f"Matrix 1 sum of diagnals is: {mat1_sum_of_primary_diagonal}")

 # calculate the sum of primary diagonals of matrix2
mat_2 = create_matrix(4, 4) # you can adjust the size of the matrix by changing the row and column in brackets
print(f"Matrix 1 : {mat_2}")
mat2_sum_of_primary_diagonal = 0
for i in range(len(mat_2)):
    for j in range(len(mat_2[i])):
        if i == j:
             print(mat_2[i][j])
             mat2_sum_of_primary_diagonal = mat2_sum_of_primary_diagonal + mat_2[i][j]

print(f"Matrix 1 sum of diagnals is: {mat2_sum_of_primary_diagonal}")

diagonals_total = mat1_sum_of_primary_diagonal + mat2_sum_of_primary_diagonal
print(f"THE SUM IS :  {diagonals_total}")

finish = time.perf_counter()
print(f"Finished in {round(finish - start, 2)} seconds(s)")

Solution

The compilation time of the Numba function is included in the benchmark because Numba use lazy compilation. You can just specify the types of the function argument to eagerly compile it. Alternatively, you can run the benchmark twice and only take into account the second run.

Here is an example:

import numba as nb

@nb.njit('float64[:,::1](int_, int_)')
def create_matrix(row, col):
    arr = np.zeros((row, col))
    for i in range(row):
        for j in range(1, col + 1):
            arr[i, j - 1] = j + (col * i)
    return arr

Moreover, note that it is better not to include print calls in the benchmark timings (since the time will likely not be stable and this is not probably what you want to measure). Not to mention printing things is generally pretty slow (compared to basic computations).

Finally, note that the script is called "parallel.py" but nothing should be done in parallel since Numba does not parallelize the code by default (and it would be slower in your case anyway due to the overhead of creating threads).