Search code examples
pythonmatplotlibscatter-plot

Change scatter marker based on column data


I'm trying to create a scatter plot where the size of the marker and the marker itself vary based on column data:

#Get the different type of rated episodes and color
office.loc[office['scaled_ratings'] < 0.25, 'mcolor'] = 'red'
office.loc[(office['scaled_ratings'] >= 0.25) & (office['scaled_ratings'] < 0.50), 'mcolor'] = 'orange'
office.loc[(office['scaled_ratings'] >= 0.50) & (office['scaled_ratings'] < 0.75), 'mcolor'] = 'lightgreen'
office.loc[office['scaled_ratings'] >= 0.75, 'mcolor'] = 'darkgreen'

#Get episodes with guests and the marker size
office.loc[office['has_guests']== False, 'size'] = 25
office.loc[office['has_guests']== True, 'size'] = 250

#Set marker for episodes
office.loc[office['has_guests']== True, 'marker'] = '*'
office.loc[office['has_guests']== False,'marker'] = 'o'


#Create plot
plt.scatter(office['episode_number'], office['viewership_mil'], c = office['mcolor'], s= office['size'], marker= office['marker'])

This method of allocating columns to arguments of the function worked perfectly until I added the marker column were I get

TypeError: unhashable type: 'Series'

I'd like to know why it doesn't work in this instance and how could I change it so that I get a specific marker based on column data


Solution

  • Markers in scatter plots must be unique to be drawn, so a data frame can be achieved by extracting and drawing marker units. Sample data is created and graphed. See examples in the official reference.

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    
    df = pd.DataFrame({'x': np.random.rand(10),
                       'y': np.random.rand(10),
                       'colors': np.random.choice(['C0','C1','C2'], size=10),
                       'sizes': np.random.randint(20,70,10),
                       'symbols': np.random.choice(['*','o','^'], size=10)})
    
    fig, ax = plt.subplots()
    
    for m in df['symbols'].unique():
        dff = df[df['symbols'] == m]
        ax.scatter(dff['x'], dff['y'], c=dff['colors'], s=dff['sizes'], marker=dff['symbols'].unique()[0])
    plt.show()
    

    enter image description here