Search code examples
python-3.xpandasmatplotlibseaborn

How to customize the the size of the markers in seaborn scatter plot


In my scatter plot, I want to color and size each data point based on some criteria. Here is the example of what I am doing:

import seaborn as sns
import matplotlib.pyplot as plt 
import pandas as pd 
import numpy as np 
%matplotlib inline 

fig, ax = plt.subplots()

#dataframe with two columns, serial number and the cooresponding fractures
df1 = pd.DataFrame({'SN':['A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7'], 
                    'Fracture': [6, 90, 35, 60, 48, 22, 6]})

#dataframe with two columns, serial number and the cooresponding force
df2 = pd.DataFrame({'SN':['A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7'], 
                    'Force': [200, 140, 170, 150, 160, 210, 190]})

df3 = pd.merge(df1, df2, on='SN')

df4 = df3.sort_values(by='Force')['Fracture']

#c
colors = ['g' if int(i) < 25 else 'yellow' if int(i) < 50 else 'orange' for i in df4]

size_map = []
for i, c in enumerate(df4):
    if colors[i] == 'g':
        size_map.append(max(1 - (c / 25) * 0.99, 0.05) * 200)
    elif colors[i] == 'yellow':
        size_map.append(max(1 - (c / 50) * 0.99, 0.05) * 200)
    elif colors[i] == 'orange':
        size_map.append(max(1 - (c / 100) * 0.99, 0.05) * 200)
        
wp = [0.1, 0.25, 0.35, 0.45, 0.55, 0.72, 0.9]
Y = df2['Force'].sort_values() 
fig, ax = plt.subplots() 
sns.scatterplot(x=Y, y=wp)      
ax.collections[0].set_color(colors)
ax.collections[0].set_sizes(size_map) 

I have tried different logics for the size map but none of them correctly calculates the size based on the provided criterial. I appreciate any input.


Solution

  • The approach below suggests some changes.

    Put all data into a dataframe, don't work with separate lists. This is especially important when sorting is involved, as the dataframe will keep all columns in the newly sorted order.

    Make use of seaborn's way for coloring. Seaborn uses a column as hue=, together with a palette which maps each hue value to its corresponding color. In the code below, 0, 1 and 2 are used for the 3 groups.

    Make use of seaborn's way for setting the sizes. One column is used as size=, where the values will be proportional to the dot sizes. The sizes=(20, 200) parameter makes sure the smallest size is mapped to dot size 20 and the largest to dot size 200. You can set it to size=(200, 20) to reverse the sizes (i.e., the smallest size to 200).

    The code below supposes you have 3 groups:

    • 0: Fracture from 0 to 25
    • 1: Fracture from 25 to 50
    • 2: Fracture from 50 to 100

    A size column is created, which subtracts the start value of each group, and divides the last group by 2. That way all values of the size column are in the range 0 to 25.

    import seaborn as sns
    import matplotlib.pyplot as plt
    import pandas as pd
    import numpy as np
    
    # dataframe with three columns, serial number and the corresponding fracture and force
    df3 = pd.DataFrame({'SN': ['A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7'],
                        'Fracture': [6, 90, 35, 60, 48, 22, 6],
                        'Force': [200, 140, 170, 150, 160, 210, 190]})
    
    df5 = df3.sort_values(by='Force')
    
    # colorcode: 
    df5['ColorCode'] = np.clip(df5['Fracture'].astype(int) // 25, 0, 2)
    palette = {0: 'limegreen', 1: 'gold', 2: 'orange'}
    
    df5['Size'] = [frac if ccode == 0 else
                   frac - 25 if ccode is 1 else
                   (frac - 50) / 2
                   for frac, ccode in zip(df5['Fracture'], df5['ColorCode'])]
    
    df5['wp'] = [0.1, 0.25, 0.35, 0.45, 0.55, 0.72, 0.9]
    
    fig, ax = plt.subplots()
    sns.scatterplot(data=df5, x='Force', y='wp', hue='ColorCode', palette=palette,
                    size='Size', sizes=(20, 200), legend=False, ax=ax)
    sns.despine()
    
    # for testing, show the fracture next to the dots
    for x, y, fr in zip(df5['Force'], df5['wp'], df5['Fracture']):
        ax.text(x, y, f'  {fr:.0f}')
    
    plt.show()
    

    sns.scatterplot set sizes per group