Search code examples
pythonplotplotlymean

How can I mark mean point in the plot?


I need help to add marker or sign of mean in this plot as shown in image. (The below image show the result as I want.)

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px

data = pd.DataFrame({'job_title':np.random.choice(['data_science','Data_analysis'],400),
              'experience_level':np.random.choice(['entry','senior'],400),
              'salary':np.random.choice((50000),400)})
data.head(1)
data= data.sort_values(by='experience_level', ascending=True)
fig = px.strip(data, x='job_title', y='salary', color='experience_level')

fig.update_layout(width=800, height=600)
fig.show()

The blue and orange dots shows mean


Solution

  • It seems that there is not straigthforward solution but I think I found a way to overcome this problem :

    fig = px.strip(data, x='job_title', y='salary', color='experience_level')
    
    # Calculate mean points for each strip category
    mean_points = data.groupby(['job_title', 'experience_level'])['salary'].mean().reset_index()
    

    Then we will have to plot a circle for each mean value :

    for index, row in mean_points.iterrows():
        # Create an offset for "entry" and "senior" points
        offset = 50 if row["experience_level"] == "entry" else -50
    
        # Use anchor to position the shape on the right strip
        fig.add_shape(type='circle',
                      xsizemode='pixel', ysizemode='pixel',
                      xanchor=row["job_title"],
                      yanchor=row["salary"],
                      x0=-5 + offset, x1=5 + offset,
                      y0=-5, y1=5,
                      line=dict(color='black', width=2),
                      fillcolor='red' if row["experience_level"] == "entry" else 'blue',
                      opacity=1)
    

    I used anchor points to be able to set the coordinates of the circles relatively. Indeed row["job_title"] is a string, so it is complicated to do calculation with it.

    In function of the experience level, I use an different offset to move the circle on the x-axis (and I also change the color).

    Then you can plot it :

    fig.update_layout(width=800, height=600)
    fig.show()
    

    Result :

    enter image description here

    One problem of this solution is that we are using pixel (hardcoded) values, but we can easily improve it by making the offset vary with the figure size.

    Hope it will help !