Search code examples
pythonmatplotliblabeling

Coloring in matplotlib scatter plot does not obey the predefined color sequence of my ListedColormap(cmap)


I have an issue when I am trying to use predefined color sequence for the labels of my data. In detail, I am using the parameter c of scatter plot for the labels of my data, and then cmap=ListedColormap(km_colors) for coloring them according to my colors list. However, it looks like the colormap decides for itself how to color the labeled data, for two classes that is if label=1 then it's colored black, which also belongs to my list of colors, and if label=0 then chooses the lightest(?) color of my color list. So, it does not obey to the order of the colors I set.

For example, in the code below you can see that even though km_colors[1]='cyan', it chooses the black color for label=1.

Thanks a lot for any help in advance.

km_colors = ['green', 'cyan', 'brown', 'darkorange', 'purple', 'black']
fig, ax = plt.subplots(3,3, sharex='col',figsize = (10,8))

for i in range(len(data_list)):
    for j in range(len(n_Enm_clusters)):
        ### c = [km_colors[int(l)] for k,l in enumerate(km_Enm_labels_list[i][j])]
        data_PCA = ax[j,i].scatter(PCA_bold[i][:,0],
                                  PCA_bold[i][:,1],
                                  c=km_Enm_labels_list[i][j], s=15,
                                  cmap = mcolors.ListedColormap(km_colors), 
                                  alpha = 0.5)

    # produce a legend with the unique colors from the scatter
        if i == len(data_list)-1:
            legend1 = ax[j,i].legend(*data_PCA.legend_elements(),
                                    loc="lower right", title="edge \n classes", prop={'size': 6})
            ax[j,i].add_artist(legend1)

plt.tight_layout()
plt.show()


pca data]:

pca data


Solution

  • When an array is provided as an input to ListedColormap(), the colors in that list are NOT picked up serially. While I am not aware of the exact process, it usually spreads it between the colors. So, if there were 6 colors in the list and...

    • your data needed one, it would usually pick the first one,
    • if your data needed 2, it would pick the first and last,
    • if you data needs 3, it would pick first, last and one close the middle of the array (orange in your case)

    and so on...

    To fix the colors to be chosen as per your list, you will need to restrict the km_colors array to the number of colors required. Below is a sample scatter plot with random data created to show how this can be done. Note that I am restricting the colors picked up by the scatter plot using cmap = ListedColormap(km_colors[0:(i*3+j+1)]), which provides scatter plot with just the first (i*3 + j)

    from matplotlib.colors import ListedColormap
    x = np.random.rand(100)
    y = np.random.rand(100)
    km_colors = ['green', 'cyan', 'brown', 'darkorange', 'purple', 'black']
    fig, ax = plt.subplots(2,3, sharex='col',figsize = (10,8))
    for i in range(2):
        for j in range(3):
            clr_col = np.random.randint(i*3+j+1, size=(100))
            data_PCA = ax[i,j].scatter(x,y, s=55, c=clr_col,
                                      cmap = ListedColormap(km_colors[0:(i*3+j+1)]), 
                                      alpha = 0.5)
            print(i*3+j+1)
            print(np.unique(clr_col))
            print(km_colors[0:(i*3+j+1)])
    
    plt.tight_layout()
    plt.show()
    

    Output plot

    enter image description here