Search code examples
pythonmatplotlibcluster-analysisscatter-plot

Python scatter plot: soft clustering


I have 2D data that I clustered using EM algorithm with soft classification. There are 3 different clusters, therefore I have a probability vector with dimension (n_clusters, n_datapoints).

Now I'd like to plot the individual datapoints in a scatter plot and assign a certain color to each cluster. The color of each point is given according to the probability to be in each cluster and thus a mixture of the cluster colors.

All I could achieve by now is the following with red, green and blue cluster colors

Scatter plot

by using the following lines of code:

for n in range(X.shape[0]):
    color = np.array([P[0,n],P[1,n],P[2,n]])[np.newaxis]
    plt.scatter(X[n,0],X[n,1],c=color)

How can I assign a different, specific color to each cluster? E.g. orange for class 0, blue for class 1, magenta for class 2.


Solution

  • I would create a dict where each cluster has it's own color. Then I would just add all cluster-colors together multiplyed by their probability:

    colors = []
    cluster_colors = {0:np.array([255,0,0]),1:np.array([0,255,0]),2:np.array([0,0,255])}
    for n in range(X.shape[0]):
        color = np.zeros([3])
        for c in range(P.shape[0]):
            color += cluster_colors[c]*P[c,n]
    
        colors.append(color)
    

    When I understood your data right, this should just work.