I have a dataframe of 100 rows of floats ranging from 0.000001
to 0.001986
that I wish to plot on a seaborn histplot, separated by class. I started with,
sns.histplot(data=df, x='score', hue='test_result', kde=True, color='red',
stat='probability', multiple='layer')
plt.show()
However, my bins were overlapping significantly. I added,
binwidth=0.000000001
To the histplot to scale the bins to scientific notation, but this code took over 2 hours to run.
My question is; is there a more computationally efficient way to do this conversion? I need to run the same code for multiple dataframes of similar size. If not, is there a better way to improve the readability of the x-axis bins instead of using scientific notation? Thanks!
Since this question has been reopened, I'll provide my answer below.
sns.histplot(data=df, x='score', hue='test_result', kde=True,
color='red', stat='probability', multiple='layer')
plt.ticklabel_format(axis='x', style='sci', scilimits=(-4,-4))
plt.show()
My understanding is that here I represent the bins on the x-axis of my histplot with scientific e notation, rather than trying to force scientific notation conversion by setting such a binwidth
as 0.000000001
.
It's worth noting (if a comment may be able to provide an explanation) that with a similar use case, a colleague of mine had some old code on an older version of seaborn/matplotlib that worked using the binwidth
method. How?
For those with the same overlapping bins issue, with my data this converted the scale to multiples of 5 (1e-4), fixing said issue and that mentioned by JohanC in comment. From the matplotlib documentation regarding scilimits
:
Use (0, 0) to include all numbers. Use (m, m) where m != 0 to fix the order of magnitude to 10m. The formatter default is
rcParams["axes.formatter.limits"]
(default:[-5, 6]
).