Search code examples
pythonnumpymatplotlibscikit-learnscatter

How to color clusters in scatter plot using an array?


I'm using sklearn noisy_circles function with 1500 points as they did in the link, to create 2 circles.
I have an array of points references, where every value is supposed to be a different cluster color:

nbrs_array = [ 88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88, 973,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
       973,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88, 984,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88, 992,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88, 972,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88, 992,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88, 984,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88, 972,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,  88,
        88,  88,  88,  88,  88]

There are 5 unique values in the array, hence there should be 5 different colors. But when I'm plotting it using:

plt.figure(figsize=(10,10))
plt.scatter(x,y, c=nbrs_array)
plt.show()

The output is enter image description here


Solution

  • Something like this should do the job:

    import numpy as np
    from matplotlib import cm
    from matplotlib import pyplot as plt
    from matplotlib.colors import Normalize, to_hex
    from sklearn import datasets
    
    
    def get_colors(arr, cmap='viridis'):
        cmap = cm.get_cmap(cmap)
        n = len(np.unique(arr))
        colornorm = Normalize(vmin=1, vmax=n)
        hex_map = dict()
        for i, cl in enumerate(np.unique(arr)):
            hex_map[cl] = to_hex(cmap(colornorm(i + 1)))
        colors = list(map(lambda x: hex_map[x], arr))
        return colors
    
    
    n_samples = 1500
    data, _ = datasets.make_circles(n_samples=n_samples, factor=.5, noise=.05)
    x, y = data[:, 0], data[:, 1]
    nbrs_array = [...]
    
    plt.figure(figsize=(10, 10))
    plt.scatter(x, y, c=get_colors(nbrs_array))
    plt.show()