Search code examples
matplotlibseaborndensity-plot

Density plot for many samples showing overall trend - how?


I'd like to show a density plot for many samples. Each sample belongs to a particular grouping variable. I can plot each individual density plot like so:

import seaborn as sns
fmri = sns.load_dataset("fmri")[['subject','timepoint','region','signal']].drop_duplicates(['subject','timepoint','region'])

region2col={'parietal':'red', 'frontal':'blue'}
fig, ax= plt.subplots(figsize=(22,10))
for subject in fmri.subject.unique():
  temp=fmri.loc[fmri.subject==subject,]
  for region in temp['region'].unique():
    temp2=temp.loc[temp.region==region,]
    
    sns.distplot(
      temp2['signal'],
      label = region,
      color=region2col[region],
      kde=True, hist=False,
      ax=ax
      )

enter image description here

However, I'd like to draw instead an overall density of the distribution of each region (same axes as above, signal and density) but with a shaded area for extremes (maximum and minimum at each signal point) and an overall fitting line describing the general trend. Similar to this:

#example only to show formatting wanted.
# XX axis should show "signal"
# YY axis should show density
g = sns.relplot(x="timepoint", y="signal",
                hue="region",
                kind="line", data=fmri)
plt.show()

enter image description here

Is this possible?


Solution

  • This is probably not the fastest method, but you could calculate the kde for each subject/region over a certain range, and then let lineplot do the rest

    from scipy.stats import gaussian_kde
    x = np.linspace(fmri['signal'].min(),fmri['signal'].max(),100)
    temp = fmri.groupby(['subject','region'])['signal'].apply(lambda temp: pd.Series(gaussian_kde(temp).evaluate(x), index=pd.Index(x, name='x')))
    temp = temp.reset_index(name='kde')
    
    plt.figure()
    sns.lineplot(data=temp, x='x', y='kde', hue='region')
    

    enter image description here