Search code examples
pythonnumpyopencv

Pythonic optimization of per-pixel image processing


I am processing an image in Python where I need to calculate an image metric on "how much is happening on an image". Metric calculates stripes of non-0 values, calculates sum of values inside the stripes, and calculate sum of squares of these sums.

Naive implementation which is processing every pixel on an image is painfully slow (as expected). What is the best way to rewrite it in Pythonic way, ideally if it can utilize multiple CPU's under the hood?


merit = 0.0

for y in range(height):
    segment_sum = 0
    for x in range(width):
        if test_image[y,x]>0:
            segment_sum+=test_image[y,x]
        elif segment_sum>0:
            if(segment_sum>1000):merit+=segment_sum*segment_sum #Ignore short segments
            segment_sum = 0

return merit


Solution

  • As stated in the comments, speeds up the code significantly:

    from timeit import timeit
    
    import numpy as np
    from numba import njit
    
    
    def get_merit(image):
        height, width = image.shape
    
        merit = 0.0
    
        for y in range(height):
            segment_sum = 0
            for x in range(width):
                if image[y, x] > 0:
                    segment_sum += image[y, x]
                elif segment_sum > 0:
                    if segment_sum > 1000:
                        merit += segment_sum * segment_sum  # Ignore short segments
                    segment_sum = 0
    
        return merit
    
    
    @njit
    def get_merit_numba(image):
        height, width = image.shape
    
        merit = 0.0
    
        for y in range(height):
            segment_sum = 0
            for x in range(width):
                if image[y, x] > 0:
                    segment_sum += image[y, x]
                elif segment_sum > 0:
                    if segment_sum > 1000:
                        merit += segment_sum * segment_sum  # Ignore short segments
                    segment_sum = 0
    
        return merit
    
    
    np.random.seed(42)
    
    image = np.random.randint(0, 255, size=(1000, 1000), dtype=np.uint8)
    assert get_merit(image) == get_merit_numba(image)
    
    t1 = timeit("get_merit(image)", number=1, globals=globals())
    t2 = timeit("get_merit_numba(image)", number=1, globals=globals())
    
    print(f"Time normal = {t1}")
    print(f"Time numba  = {t2}")
    

    Prints on my machine AMD 5700x:

    Time normal = 1.1393451909534633
    Time numba  = 0.00026865419931709766