I want something like the plots shown in figure below, where the blue line is the average line that is generated by plotting the mean of all y-coordinate values of data-points that have the same x-coordinate values.
I tried the code below
window_size = 10
df_avg = pd.DataFrame(columns=df.columns)
for col in df.columns:
df_avg[col] = df[col].rolling(window=window_size).mean()
plt.figure(figsize=(20,20))
for idx, col in enumerate(df.columns, 1):
plt.subplot(df.shape[1]-4, 4, idx)
sns.scatterplot(data=df, x=col, y='charges')
plt.plot(df_avg[col],df['charges'])
plt.xlabel(col)
And, got plots shown below, which obviously, is not what I wanted.
If you're looking for a purely matplotlib way to do it. Here is a possible direction you can take:
import matplotlib.pyplot as plt
import numpy as np
### Create toy dataset consisting of (500,2) points
N_points=500
rand_pts=np.random.choice(50,size=(N_points,2))
#create a dictionary with keys the unique x values and values the different y values corresponding to this unique x
rand_dict={uni:rand_pts[np.where(rand_pts[:,0]==uni),1] for uni in np.unique(rand_pts[:,0])}
#plot
plt.scatter(rand_pts[:,0],rand_pts[:,1],s=50) #plot the scatter plot
plt.plot(list(rand_dict.keys()),[np.mean(val) for val in rand_dict.values()],color='tab:orange',lw=4) #plot the mean y values for each unique x