I am processing an image in Python where I need to calculate an image metric on "how much is happening on an image". Metric calculates stripes of non-0 values, calculates sum of values inside the stripes, and calculate sum of squares of these sums.
Naive implementation which is processing every pixel on an image is painfully slow (as expected). What is the best way to rewrite it in Pythonic way, ideally if it can utilize multiple CPU's under the hood?
merit = 0.0
for y in range(height):
segment_sum = 0
for x in range(width):
if test_image[y,x]>0:
segment_sum+=test_image[y,x]
elif segment_sum>0:
if(segment_sum>1000):merit+=segment_sum*segment_sum #Ignore short segments
segment_sum = 0
return merit
As stated in the comments, numba speeds up the code significantly:
from timeit import timeit
import numpy as np
from numba import njit
def get_merit(image):
height, width = image.shape
merit = 0.0
for y in range(height):
segment_sum = 0
for x in range(width):
if image[y, x] > 0:
segment_sum += image[y, x]
elif segment_sum > 0:
if segment_sum > 1000:
merit += segment_sum * segment_sum # Ignore short segments
segment_sum = 0
return merit
@njit
def get_merit_numba(image):
height, width = image.shape
merit = 0.0
for y in range(height):
segment_sum = 0
for x in range(width):
if image[y, x] > 0:
segment_sum += image[y, x]
elif segment_sum > 0:
if segment_sum > 1000:
merit += segment_sum * segment_sum # Ignore short segments
segment_sum = 0
return merit
np.random.seed(42)
image = np.random.randint(0, 255, size=(1000, 1000), dtype=np.uint8)
assert get_merit(image) == get_merit_numba(image)
t1 = timeit("get_merit(image)", number=1, globals=globals())
t2 = timeit("get_merit_numba(image)", number=1, globals=globals())
print(f"Time normal = {t1}")
print(f"Time numba = {t2}")
Prints on my machine AMD 5700x:
Time normal = 1.1393451909534633
Time numba = 0.00026865419931709766