I'm trying to create a scatter plot where the size of the marker and the marker itself vary based on column data:
#Get the different type of rated episodes and color
office.loc[office['scaled_ratings'] < 0.25, 'mcolor'] = 'red'
office.loc[(office['scaled_ratings'] >= 0.25) & (office['scaled_ratings'] < 0.50), 'mcolor'] = 'orange'
office.loc[(office['scaled_ratings'] >= 0.50) & (office['scaled_ratings'] < 0.75), 'mcolor'] = 'lightgreen'
office.loc[office['scaled_ratings'] >= 0.75, 'mcolor'] = 'darkgreen'
#Get episodes with guests and the marker size
office.loc[office['has_guests']== False, 'size'] = 25
office.loc[office['has_guests']== True, 'size'] = 250
#Set marker for episodes
office.loc[office['has_guests']== True, 'marker'] = '*'
office.loc[office['has_guests']== False,'marker'] = 'o'
#Create plot
plt.scatter(office['episode_number'], office['viewership_mil'], c = office['mcolor'], s= office['size'], marker= office['marker'])
This method of allocating columns to arguments of the function worked perfectly until I added the marker column were I get
TypeError: unhashable type: 'Series'
I'd like to know why it doesn't work in this instance and how could I change it so that I get a specific marker based on column data
Markers in scatter plots must be unique to be drawn, so a data frame can be achieved by extracting and drawing marker units. Sample data is created and graphed. See examples in the official reference.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'x': np.random.rand(10),
'y': np.random.rand(10),
'colors': np.random.choice(['C0','C1','C2'], size=10),
'sizes': np.random.randint(20,70,10),
'symbols': np.random.choice(['*','o','^'], size=10)})
fig, ax = plt.subplots()
for m in df['symbols'].unique():
dff = df[df['symbols'] == m]
ax.scatter(dff['x'], dff['y'], c=dff['colors'], s=dff['sizes'], marker=dff['symbols'].unique()[0])
plt.show()