I'm trying to count the number of times a colour (or a closest match to one of 17 colours) appears in the pixels of an image (given as 300x300x3 np array, float values [0,1]). I've written this but it seems to be extremely inefficient:
for w in range(300):
for h in range(300):
colordistance = float('inf')
colorindex = 0
for c in range(17):
r1 = color[c, 0]
g1 = color[c, 1]
b1 = color[c, 2]
r2 = img[w, h, 0]
g2 = img[w, h, 1]
b2 = img[w, h, 2]
distance = math.sqrt(
((r2-r1)*0.3)**2 + ((g2-g1)*0.59)**2 + ((b2-b1)*0.11)**2)
if distance < colordistance:
colordistance = distance
colorindex = c
colorcounters[colorindex] = colorcounters[colorindex] + 1
Are there any ways I can improve the efficiency of this bit? I'm already using multiprocessing for an outer loop.
You mentioned you're using numpy, so you should avoid iterating as much as possible. My vectorized implementation is about 40x faster. I made a few changes to your code so they could use the same arrays and so that I could verify correctness. This may affect the speed.
import numpy as np
import time
import math
num_images = 1
hw = 300 # height and width
nc = 17 # number of colors
img = np.random.random((num_images, hw, hw, 1, 3))
colors = np.random.random((1, 1, 1, nc, 3))
## NUMPY IMPLEMENTATION
t = time.time()
dist_sq = np.sum(((img - colors) * [0.3, 0.59, 0.11]) ** 2, axis=4) # calculate (distance * coefficients) ** 2
largest_color = np.argmin(dist_sq, axis=3) # find the minimum
color_counters = np.unique(largest_color, return_counts=True) # count
print(color_counters)
# should return an object like [[1, 2, 3, ... 17], [count1, count2, count3, ...]]
print("took {} s".format(time.time() - t))
## REFERENCE IMPLEMENTATION
t = time.time()
colorcounters = [0 for i in range(nc)]
for i in range(num_images):
for h in range(hw):
for w in range(hw):
colordistance = float('inf')
colorindex = 0
for c in range(nc):
r1 = colors[0, 0, 0, c, 0]
g1 = colors[0, 0, 0, c, 1]
b1 = colors[0, 0, 0, c, 2]
r2 = img[i, w, h, 0, 0]
g2 = img[i, w, h, 0, 1]
b2 = img[i, w, h, 0, 2]
# dist_sq
distance = math.sqrt(((r2-r1)*0.3)**2 + ((g2-g1)*0.59)**2 + ((b2-b1)*0.11)**2) # not using sqrt gives a 14% improvement
# largest_color
if distance < colordistance:
colordistance = distance
colorindex = c
# color_counters
colorcounters[colorindex] = colorcounters[colorindex] + 1
print(colorcounters)
print("took {} s".format(time.time() - t))