Search code examples
pythonmatplotliblegendscatter

Plot scatter and legends from a data frame


i am having a problem displaying a legend in a scatter plot. A simple example to show you my problem is this one: each integer from 1 to 10 has color. And I want a label shown for the ten numbers and their colors (basically : color and its corresponding number)

I have all the values in a dataframe (the dataframe i show you is only an example, the real example is consists of hundreds of lines)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1,2,3,4,5,6,7,8,9,10])
y = 2*x

df = pd.DataFrame()
palette = {1: "blue", 2:"orange", 3:"green", 4:"red", 5:"purple", 6:"brown", 7:"pink", 8:"gray", 9:"olive", 10:"cyan"}
df["first"] = x
df["second"] = y
df["third"] = df["first"].apply(lambda x: palette[x])
plt.scatter(df["first"], df["second"], c=df["third"])
plt.legend()
plt.show()

adding the argument to the scatter line does not help (legend = c=df2["third"])

I cannot find a solution for this.

Thank you if you have any pointers


Solution

  • Each line in a standard legend corresponds to one plot with a label. You can draw the scatter plot one color at a time, and assign the corresponding label.

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    
    x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    y = 2 * x
    
    df = pd.DataFrame()
    palette = {1: "blue", 2: "orange", 3: "green", 4: "red", 5: "purple", 6: "brown", 7: "pink", 8: "gray", 9: "olive", 10: "cyan"}
    df["first"] = x
    df["second"] = y
    df["third"] = df["first"].apply(lambda x: palette[x])
    for number, color in palette.items():
        plt.scatter(x="first", y="second", c=color, data=df[df["first"] == number], label=number)
    plt.legend()
    plt.show()
    

    scatter plot with labels per color

    The process can be simplified a lot, working with hue in Seaborn:

    import matplotlib.pyplot as plt
    import seaborn as sns
    import pandas as pd
    import numpy as np
    
    x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    y = 2 * x
    
    df = pd.DataFrame()
    palette = {1: "blue", 2: "orange", 3: "green", 4: "red", 5: "purple", 6: "brown", 7: "pink", 8: "gray", 9: "olive", 10: "cyan"}
    df["first"] = x
    df["second"] = y
    df["third"] = df["first"].apply(lambda x: palette[x])
    sns.set()
    sns.scatterplot(data=df, x="first", y="second", hue="first", palette=palette)
    plt.show()
    

    seaborn sns.scatterplot