I struggled with this function to reduce a list . Now it turns out to be shitty slow. Well I needed 3 loops ….
Its ok with this short c-list, but is useless with a bigger list. How can I improve this ? colormath library is a bit faster, but I would prefer Skimage. I guess this will need some numpy magic ?
My head is already exploding.
import numpy as np
import copy
from skimage import color
c_list = [(226575, (0, 0, 0)), (18279, (10, 132, 85)), (15744, (4, 152, 111)), (8768, (85, 146, 40)), (7516, (0, 166, 129)), (7482, (136, 161, 12)),
(7092, (31, 127, 68)), (6612, (10, 47, 2)), (6304, (61, 135, 47)), (5524, (42, 97, 7)), (5202, (157, 167, 2)), (5153, (10, 32, 9)), (4807, (49, 132, 61)),
(3921, (48, 115, 3)), (3859, (19, 75, 1)), (3784, (115, 162, 38)), (3169, (103, 151, 30)), (2616, (60, 79, 43)), (2453, (113, 168, 92)), (2397, (45, 71, 0))]
def col_dist(color1, color2):
rgb_color1 = [x / 255 for x in color1]
rgb_color2 = [x / 255 for x in color2]
color1_lab = color.rgb2lab(rgb_color1)
color2_lab = color.rgb2lab(rgb_color2)
return color.deltaE_ciede2000(color2_lab, color1_lab)
def reduction_c2(color_in_list, max_colors, delta_start):
# input c list, max number of colors, delta E starting value
# returns reduced list with added pixelcounts
color_in = [list(elem) for elem in color_in_list]
fresh_list = color_in
while (len(color_in) > max_colors):
i = 0
j = 0
color_in = copy.deepcopy(fresh_list)
while j < len(color_in):
c = color_in[j][1][0], color_in[j][1][1], color_in[j][1][2]
while i+1 < len(color_in):
c_compare = color_in[i+1][1][0], color_in[i+1][1][1], color_in[i+1][1][2]
delta_new = col_dist(c, c_compare)
if delta_new <= delta_start:
color_in[j][0] = color_in[j][0] + color_in[i+1][0]
del color_in[i+1]
else:
i += 1
j += 1
i = j
delta_start +=1
color_in.sort(reverse = True)
return(color_in)
out2 = reduction_c2(c_list, 6, 2)
print("out new = ", out2 )
Since this algorithm is going to end up computing the distance from every color to every other color, I would suggest creating an NxN matrix of every distance, and using this matrix instead of calling col_dist()
. This is sufficient to get a 5x speedup on your example.
Also, I would suggest using vectorization when creating this matrix. This gave me a 400x speedup.
def col_dist_all(colors):
colors = np.array([color for count, color in colors]) / 255
colors_lab = color.rgb2lab(colors)
color_distances = color.deltaE_ciede2000(colors_lab[np.newaxis, :, :], colors_lab[:, np.newaxis, :])
return color_distances
Note that when removing elements from your list of colors, you will have the issue that the index of a color within the matrix changes. Instead of deleting elements of color_in_list
, I would suggest setting them to None and skipping over them in future loops.
If you are starting with a very large number of colors, you may want to try KMeans clustering with a euclidean metric in RGB color space, to reduce to a manageable number of colors. I found that 0.5 * np.linalg.norm(c1 - c2)
was always within a factor of 0.3 to 1 of the true CIEDE 2000 distance, and euclidean distance would be much faster to calculate.