In my scatter plot, I want to color and size each data point based on some criteria. Here is the example of what I am doing:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
%matplotlib inline
fig, ax = plt.subplots()
#dataframe with two columns, serial number and the cooresponding fractures
df1 = pd.DataFrame({'SN':['A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7'],
'Fracture': [6, 90, 35, 60, 48, 22, 6]})
#dataframe with two columns, serial number and the cooresponding force
df2 = pd.DataFrame({'SN':['A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7'],
'Force': [200, 140, 170, 150, 160, 210, 190]})
df3 = pd.merge(df1, df2, on='SN')
df4 = df3.sort_values(by='Force')['Fracture']
#c
colors = ['g' if int(i) < 25 else 'yellow' if int(i) < 50 else 'orange' for i in df4]
size_map = []
for i, c in enumerate(df4):
if colors[i] == 'g':
size_map.append(max(1 - (c / 25) * 0.99, 0.05) * 200)
elif colors[i] == 'yellow':
size_map.append(max(1 - (c / 50) * 0.99, 0.05) * 200)
elif colors[i] == 'orange':
size_map.append(max(1 - (c / 100) * 0.99, 0.05) * 200)
wp = [0.1, 0.25, 0.35, 0.45, 0.55, 0.72, 0.9]
Y = df2['Force'].sort_values()
fig, ax = plt.subplots()
sns.scatterplot(x=Y, y=wp)
ax.collections[0].set_color(colors)
ax.collections[0].set_sizes(size_map)
I have tried different logics for the size map but none of them correctly calculates the size based on the provided criterial. I appreciate any input.
The approach below suggests some changes.
Put all data into a dataframe, don't work with separate lists. This is especially important when sorting is involved, as the dataframe will keep all columns in the newly sorted order.
Make use of seaborn's way for coloring. Seaborn uses a column as hue=
, together with a palette which maps each hue value to its corresponding color. In the code below, 0, 1 and 2 are used for the 3 groups.
Make use of seaborn's way for setting the sizes. One column is used as size=
, where the values will be proportional to the dot sizes. The sizes=(20, 200)
parameter makes sure the smallest size
is mapped to dot size 20 and the largest to dot size 200. You can set it to size=(200, 20)
to reverse the sizes (i.e., the smallest size
to 200).
The code below supposes you have 3 groups:
A size
column is created, which subtracts the start value of each group, and divides the last group by 2. That way all values of the size
column are in the range 0 to 25.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# dataframe with three columns, serial number and the corresponding fracture and force
df3 = pd.DataFrame({'SN': ['A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7'],
'Fracture': [6, 90, 35, 60, 48, 22, 6],
'Force': [200, 140, 170, 150, 160, 210, 190]})
df5 = df3.sort_values(by='Force')
# colorcode:
df5['ColorCode'] = np.clip(df5['Fracture'].astype(int) // 25, 0, 2)
palette = {0: 'limegreen', 1: 'gold', 2: 'orange'}
df5['Size'] = [frac if ccode == 0 else
frac - 25 if ccode is 1 else
(frac - 50) / 2
for frac, ccode in zip(df5['Fracture'], df5['ColorCode'])]
df5['wp'] = [0.1, 0.25, 0.35, 0.45, 0.55, 0.72, 0.9]
fig, ax = plt.subplots()
sns.scatterplot(data=df5, x='Force', y='wp', hue='ColorCode', palette=palette,
size='Size', sizes=(20, 200), legend=False, ax=ax)
sns.despine()
# for testing, show the fracture next to the dots
for x, y, fr in zip(df5['Force'], df5['wp'], df5['Fracture']):
ax.text(x, y, f' {fr:.0f}')
plt.show()