Search code examples
pythonpandasmatplotliblegendscatter-plot

Create a color-coded key for a matplotlib scatter plot with specific colors


Here is the data:

import pandas as pd

data = {'letter': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X'], 'color': ['#FF0000', '#FF7F00', '#FFD400', '#FFFF00', '#BFFF00', '#6AFF00', '#00EAFF', '#0095FF', '#0040FF', '#AA00FF', '#FF00AA', '#EDB9B9', '#E7E9B9', '#B9EDE0', '#B9D7ED', '#DCB9ED', '#8F2323', '#8F6A23', '#4F8F23', '#23628F', '#6B238F', '#000000', '#737373', '#CCCCCC'], 'percent': [0.59, 0.569, 0.343, 0.791, 0.099, 0.047, 0.387, 0.232, 0.262, 0.177, 0.522, 0.317, 0.252, 0.617, 0.644, 0.571, 0.382, 0.12, 0.281, 0.855, 0.283, 1.0, 0.844, 0.499], 'score': [0.541, 0.399, 0.625, 0.584, 0.83, 0.859, 0.62, 0.618, 0.545, 0.536, 0.513, 0.563, 0.592, 0.276, 0.037, 0.0, 0.5, 0.653, 0.485, 0.213, 0.44, 0.0, 0.308, 0.35]}
df = pd.DataFrame(data)

# display(df.head())
  letter    color  percent  score
0      A  #FF0000    0.590  0.541
1      B  #FF7F00    0.569  0.399
2      C  #FFD400    0.343  0.625
3      D  #FFFF00    0.791  0.584
4      E  #BFFF00    0.099  0.830

Where the leftmost column is the index.

This code creates a scatter plot:

df.plot.scatter(x='percent', y='score', color=df['color'])

enter image description here

On the right, I want to have a key specifying which color represents which letter. Ideally it should be a list of solid colored rectangles and the letter. I have not been able to find a solution where one can use colors that they had selected, but I need that behavior as there will be multiple plots that need to be color coded the same way.


Solution

  • You can use mpatches.Patch for a custom legend.

    import matplotlib.patches as mpatches
    
    ax = df.plot.scatter(x='percent', y='score', color=df['color'])
    
    colorlist = zip(df['letter'], df['color'])
    handles = [mpatches.Patch(color=colour, label=label) for label, colour in colorlist]
    labels = df['letter']
    
    ax.legend(handles, labels, ncol=2, bbox_to_anchor=(1, 1))
    

    matplotlib result

    Alternatively, you could use seaborn

    import seaborn as sns
    
    ax = sns.scatterplot(x=df['percent'], y=df['score'], palette=df['color'].tolist(), hue=df['letter'])
    ax.legend(ncol=2, bbox_to_anchor=(1, 1))
    

    seaborn result