Numpy loop performance optimization Python

I struggled with this function to reduce a list . Now it turns out to be shitty slow. Well I needed 3 loops ….

Its ok with this short c-list, but is useless with a bigger list. How can I improve this ? colormath library is a bit faster, but I would prefer Skimage. I guess this will need some numpy magic ?

My head is already exploding.

import numpy as np
import copy
from skimage import color 

c_list =  [(226575, (0, 0, 0)), (18279, (10, 132, 85)), (15744, (4, 152, 111)), (8768, (85, 146, 40)), (7516, (0, 166, 129)), (7482, (136, 161, 12)),
           (7092, (31, 127, 68)), (6612, (10, 47, 2)), (6304, (61, 135, 47)), (5524, (42, 97, 7)), (5202, (157, 167, 2)), (5153, (10, 32, 9)), (4807, (49, 132, 61)),
             (3921, (48, 115, 3)), (3859, (19, 75, 1)), (3784, (115, 162, 38)), (3169, (103, 151, 30)), (2616, (60, 79, 43)), (2453, (113, 168, 92)), (2397, (45, 71, 0))]


def col_dist(color1, color2):
    rgb_color1 = [x / 255 for x in color1]
    rgb_color2 = [x / 255 for x in color2]
    color1_lab = color.rgb2lab(rgb_color1)
    color2_lab = color.rgb2lab(rgb_color2)
    return color.deltaE_ciede2000(color2_lab, color1_lab)       


def reduction_c2(color_in_list, max_colors, delta_start):      
    # input c list, max number of colors, delta E starting value
    # returns reduced list with added pixelcounts
    color_in = [list(elem) for elem in color_in_list]
    fresh_list = color_in
    while (len(color_in) > max_colors):
        i = 0
        j = 0
        color_in = copy.deepcopy(fresh_list)

        while j < len(color_in):
            c = color_in[j][1][0], color_in[j][1][1], color_in[j][1][2]

            while i+1 < len(color_in):
                c_compare = color_in[i+1][1][0], color_in[i+1][1][1], color_in[i+1][1][2]
                delta_new = col_dist(c, c_compare)
                if delta_new <= delta_start:
                    color_in[j][0] = color_in[j][0] + color_in[i+1][0]
                    del color_in[i+1]
                else:
                    i += 1
            j += 1
            i = j 
        delta_start +=1   
    color_in.sort(reverse = True) 
    return(color_in)


out2 = reduction_c2(c_list, 6, 2)
print("out new = ", out2 )

Solution

Since this algorithm is going to end up computing the distance from every color to every other color, I would suggest creating an NxN matrix of every distance, and using this matrix instead of calling col_dist(). This is sufficient to get a 5x speedup on your example.

Also, I would suggest using vectorization when creating this matrix. This gave me a 400x speedup.

def col_dist_all(colors):
    colors = np.array([color for count, color in colors]) / 255
    colors_lab = color.rgb2lab(colors)
    color_distances = color.deltaE_ciede2000(colors_lab[np.newaxis, :, :], colors_lab[:, np.newaxis, :])
    return color_distances

Note that when removing elements from your list of colors, you will have the issue that the index of a color within the matrix changes. Instead of deleting elements of color_in_list, I would suggest setting them to None and skipping over them in future loops.

If you are starting with a very large number of colors, you may want to try KMeans clustering with a euclidean metric in RGB color space, to reduce to a manageable number of colors. I found that 0.5 * np.linalg.norm(c1 - c2) was always within a factor of 0.3 to 1 of the true CIEDE 2000 distance, and euclidean distance would be much faster to calculate.