Search code examples
pythonmatplotlibseabornscatter-plot

scatterplot doesn't use correct markers and colorbar fix


I am trying to visualize UMAP and have the following code and plot already, my goal is to have two different markers for two classes in my dataset and also have a color for each group i have in my dataset (groups are VP XXX, see colorbar in image) which actually already worked out somehow.

The issue is that the markers aren't the ones I am trying to get and the colorbars isn't very accurate in telling me which color is which group.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

#### prepare lists ####
VP_with_col = []
m=[]
col = []
embedding = [[0,0.8],[0.5,0.5], [0.9,0.5],[0.2,0.9],[0.4,0.4],[0.6,0.5],[0.77,0.59],[0.8,0.1]]
EXAMPLE_VP = ["VP124","VP124","VP125", "VP125", "VP203", "VP203","VP258","VP258"]
EXAMPLE_LABELS = [0,1,0,1,0,1,0,1]
dataframe = pd.DataFrame({"VP": EXAMPLE_VP, "label": EXAMPLE_LABELS})
VP_list = dataframe.VP.unique()
# add color/value to each unique VP
for idx,vp in enumerate(VP_list): 
    VP_with_col.append([1+idx, vp]) #somehow this gives me a different color for each group which is great

#create color array of length len(dataframe.VP) with a color for each group
for idx, vp in enumerate(dataframe.VP):
    for vp_col in VP_with_col:
        if(vp_col[1] == vp):
         col.append(vp_col[0])   

#### create marker list ####
for elem in dataframe.label:
  if(elem == 0):
        m.append("o")
  else:
        m.append("^")

########################## relevant part for question ############################

#### create plot dataframe from lists and UMAP embedding ####
plot_df = pd.DataFrame(data={"x":embedding[:,0], "y": embedding[:,1], "color":col, "marker": m })

plt.style.use("seaborn")   
plt.figure()

#### Plot ####
ax= sns.scatterplot(data=plot_df, x="x",y="y",style= "marker" , c= col, cmap='Spectral', s=5 )
ax.set(xlabel = None, ylabel = None)
plt.gca().set_aspect('equal', 'datalim')

#### Colorbar ####
norm = plt.Normalize(min(col), max(col))
sm = plt.cm.ScalarMappable(cmap="Spectral", norm=norm)
sm.set_array([])

# Remove the legend(marker legend) , add colorbar
ax.get_legend().remove()
cb = ax.figure.colorbar(sm)  

cb.set_ticks(np.arange(len(VP_list)))
cb.set_ticklabels(VP_list)

##### save
plt.title('UMAP projection of feature space', fontsize=12) 
plt.savefig("./umap_plot",dpi=1200)

getting me this plot with standard marker and 'x' marker. In style = "marker" the marker column of the dataframe is something like ["^", "o","^","^","^","o"...]:

enter image description here

Is it also possible to make it clearer which color belongs to which class in the colorbar?


Solution

  • You're doing a lot of manipulations that would be needed for matplotlib without Seaborn. With Seaborn, most is this goes automatic. Here is how it could look like with your test data:

    import matplotlib.pyplot as plt
    import seaborn as sns
    import pandas as pd
    import numpy as np
    
    embedding = np.array([[0, 0.8], [0.5, 0.5], [0.9, 0.5], [0.2, 0.9], [0.4, 0.4], [0.6, 0.5], [0.77, 0.59], [0.8, 0.1]])
    EXAMPLE_VP = ["VP124", "VP124", "VP125", "VP125", "VP203", "VP203", "VP258", "VP258"]
    EXAMPLE_LABELS = [0, 1, 0, 1, 0, 1, 0, 1]
    plot_df = pd.DataFrame({"x": embedding[:, 0], "y": embedding[:, 1], "VP": EXAMPLE_VP, "label": EXAMPLE_LABELS})
    
    plt.figure()
    plt.style.use("seaborn")
    
    ax = sns.scatterplot(data=plot_df, x="x", y="y",
                         hue='VP', palette='Spectral',
                         style="label", markers=['^', 'o'], s=100)
    ax.set(xlabel=None, ylabel=None)
    ax.set_aspect('equal', 'datalim')
    # sns.move_legend(ax, bbox_to_anchor=(1.01, 1.01), loc='upper left')
    
    plt.tight_layout()
    plt.show()
    

    seaborn scatter plot with colors and markers

    Note that the 'Spectral' colormap assigns a light yellow color to 'VP203' which is difficult to see with the default background. You might want to use e.g. palette='Set2' for the colors.