Search code examples
pythonmatplotlibscatter-plot

Show Mean in Scatterplot


I have a dataset with three columns. The first shows the actual data and the second shows the mean of the first. The third is a category. I want to show both the data and the mean in a scatterplot. However, the mean is plotted N times, not only once - making it difficult to control the visualization. Is there a way to plot the mean only once?

Here is a toy example where the orange dots (indicating the mean) are plotted 14 times on top of each other.

import matplotlib.pyplot as plt
import pandas as pd
  

df = pd.DataFrame({
    'Numbers': [10,20,30,40,50,60,11,59,12,58,13,57,25,45],
    'Mean':    [35,35,35,35,35,35,35,35,35,35,35,35,35,35],
    'y':       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ]    
    })

fig, ax = plt.subplots(1)
ax.scatter(y="y", x= "Numbers", data=df, s = 200, alpha = .5, label = "Mean response")
ax.scatter(y="y", x= "Mean", data=df, s = 200, alpha = .15, label = "Average mean response")
legend = ax.legend(loc="lower right", framealpha = .4)

enter image description here


Solution

  • Just plot one value:

    fig, ax = plt.subplots(1)
    ax.scatter(y="y", x="Numbers", data=df, s=200,
               alpha=.5, label="Average mean response")
    
    ax.scatter(df['Numbers'].mean(), df['y'].mean(),  # plot one value, not 15
               s=200, alpha=.15,
               label="Average mean response")
    
    legend = ax.legend(loc="lower right", framealpha = .4)
    

    output:

    scatter plot with mean

    If you have several groups:

    ax.scatter(x='Numbers', y='y',
               data=df.groupby('y', as_index=False)['Numbers'].mean(),
               s=200, alpha=.15, label="Average mean response")
    

    output:

    scatter plot mean several groups