Most dominant color in RGB image - OpenCV / NumPy / Python

I have a python image processing function, that uses tries to get the dominant color of an image. I make use of a function I found here

It works, but unfortunately I don't quite understand what it does and I learned that np.histogram is rather slow and I should use cv2.calcHist since it's 40x faster according to this:

I'd like to understand how I have to update the code to use cv2.calcHist, or better, which values I have to input.

My function

def centroid_histogram(clt):
    # grab the number of different clusters and create a histogram
    # based on the number of pixels assigned to each cluster
    num_labels = np.arange(0, len(np.unique(clt.labels_)) + 1)
    (hist, _) = np.histogram(clt.labels_, bins=num_labels)

    # normalize the histogram, such that it sums to one
    hist = hist.astype("float")
    hist /= hist.sum()

    # return the histogram
    return hist

The pprint of clt is this, not sure if this helps

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=1, n_init=10, n_jobs=1, precompute_distances='auto',
    random_state=None, tol=0.0001, verbose=0)

My code can be found here:

I am a very beginner, so any help is highly appreciated.

As per request:

Sample Image


Most dominant color:


Computation time for the Histogram:



  • Two approaches using np.unique and np.bincount to get the most dominant color could be suggested. Also, in the linked page, it talks about bincount as a faster alternative, so that could be the way to go.

    Approach #1

    def unique_count_app(a):
        colors, count = np.unique(a.reshape(-1,a.shape[-1]), axis=0, return_counts=True)
        return colors[count.argmax()]

    Approach #2

    def bincount_app(a):
        a2D = a.reshape(-1,a.shape[-1])
        col_range = (256, 256, 256) # generically : a2D.max(0)+1
        a1D = np.ravel_multi_index(a2D.T, col_range)
        return np.unravel_index(np.bincount(a1D).argmax(), col_range)

    Verification and timings on 1000 x 1000 color image in a dense range [0,9) for reproducible results -

    In [28]: np.random.seed(0)
        ...: a = np.random.randint(0,9,(1000,1000,3))
        ...: print unique_count_app(a)
        ...: print bincount_app(a)
    [4 7 2]
    (4, 7, 2)
    In [29]: %timeit unique_count_app(a)
    1 loop, best of 3: 820 ms per loop
    In [30]: %timeit bincount_app(a)
    100 loops, best of 3: 11.7 ms per loop

    Further boost

    Further boost upon leveraging multi-core with numexpr module for large data -

    import numexpr as ne
    def bincount_numexpr_app(a):
        a2D = a.reshape(-1,a.shape[-1])
        col_range = (256, 256, 256) # generically : a2D.max(0)+1
        eval_params = {'a0':a2D[:,0],'a1':a2D[:,1],'a2':a2D[:,2],
        a1D = ne.evaluate('a0*s0*s1+a1*s0+a2',eval_params)
        return np.unravel_index(np.bincount(a1D).argmax(), col_range)

    Timings -

    In [90]: np.random.seed(0)
        ...: a = np.random.randint(0,9,(1000,1000,3))
    In [91]: %timeit unique_count_app(a)
        ...: %timeit bincount_app(a)
        ...: %timeit bincount_numexpr_app(a)
    1 loop, best of 3: 843 ms per loop
    100 loops, best of 3: 12 ms per loop
    100 loops, best of 3: 8.94 ms per loop