I have originally used numpy function .std on my dataframe to obtain standard deviation and plot it using matplotlib. Later, I have tried making the same graph using seaborn. The two graphs looked close enough until I overlayed them and found that all error bars from seaborn are smaller - the difference being more pronounced the bigger they are. I checked in different software that the results from .std are correct and that they are also correctly plotted. What could be the source of problems (I can't seem to be able to pull out the graph source data from seaborn)?
I used this code:
ax_sns = sns.barplot(x = 'name', y = column_to_plot, data=data, hue='method', capsize=0.1, ci='sd', errwidth=0.9)
You didn't provide the code where you calculated the standard deviation. Perhaps you used pandas .std()
. Seaborn uses numpy's. Numpy's std
uses the "Bessel's correction". The difference is most visible when the number of data points is small (when / n
vs / (n-1)
is larger).
The following code visualizes the difference between error bars calculated via seaborn, numpy and pandas.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
flights = sns.load_dataset('flights')
fig, ax = plt.subplots(figsize=(12, 5))
sns.barplot(x='month', y='passengers', data=flights, capsize=0.1, ci='sd', errwidth=0.9, fc='yellow', ec='blue', ax=ax)
flights['month'] = flights['month'].cat.codes # change to a numeric format
for month, data in flights.groupby('month'):
mean = data['passengers'].mean()
pandas_std = data['passengers'].std()
numpy_std = np.std(data['passengers'])
ax.errorbar(month - 0.2, mean, yerr=numpy_std, ecolor='crimson', capsize=8,
label='numpy std()' if month == 0 else None)
ax.errorbar(month + 0.2, mean, yerr=pandas_std, ecolor='darkgreen', capsize=8,
label='pandas std()' if month == 0 else None)
ax.margins(x=0.015)
ax.legend()
plt.tight_layout()
plt.show()
PS: Some related posts with additional information: