I need help to add marker or sign of mean in this plot as shown in image. (The below image show the result as I want.)
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
data = pd.DataFrame({'job_title':np.random.choice(['data_science','Data_analysis'],400),
'experience_level':np.random.choice(['entry','senior'],400),
'salary':np.random.choice((50000),400)})
data.head(1)
data= data.sort_values(by='experience_level', ascending=True)
fig = px.strip(data, x='job_title', y='salary', color='experience_level')
fig.update_layout(width=800, height=600)
fig.show()
It seems that there is not straigthforward solution but I think I found a way to overcome this problem :
fig = px.strip(data, x='job_title', y='salary', color='experience_level')
# Calculate mean points for each strip category
mean_points = data.groupby(['job_title', 'experience_level'])['salary'].mean().reset_index()
Then we will have to plot a circle for each mean value :
for index, row in mean_points.iterrows():
# Create an offset for "entry" and "senior" points
offset = 50 if row["experience_level"] == "entry" else -50
# Use anchor to position the shape on the right strip
fig.add_shape(type='circle',
xsizemode='pixel', ysizemode='pixel',
xanchor=row["job_title"],
yanchor=row["salary"],
x0=-5 + offset, x1=5 + offset,
y0=-5, y1=5,
line=dict(color='black', width=2),
fillcolor='red' if row["experience_level"] == "entry" else 'blue',
opacity=1)
I used anchor points to be able to set the coordinates of the circles relatively. Indeed row["job_title"] is a string, so it is complicated to do calculation with it.
In function of the experience level, I use an different offset to move the circle on the x-axis (and I also change the color).
Then you can plot it :
fig.update_layout(width=800, height=600)
fig.show()
Result :
One problem of this solution is that we are using pixel (hardcoded) values, but we can easily improve it by making the offset vary with the figure size.
Hope it will help !