I am plotting some lines with Seaborn:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
# dfs: dict[str, DataFrame]
fig, ax = plt.subplots()
for label, df in dfs.items():
sns.lineplot(
data=df,
x="Iteration",
y="Loss",
errorbar="sd",
label=label,
ax=ax,
)
ax.set(xscale='log', yscale='log')
The result looks like this.
Note the clipped negative values in the lower error band of the "effector_final_velocity" curve, since the standard deviation of the loss is larger than its mean, for those iterations.
However, if ax.set(xscale='log', yscale='log')
is called before the looped calls to sns.lineplot
, the result looks like this
I'm not sure where the unclipped values are arising.
Looking at the source of seaborn.relational
: at the end of lineplot
, the plot
method of a _LinePlotter
instance is called. It plots the error bands by passing the already-computed standard deviation bounds to ax.fill_between
.
Inspecting the values of these bounds right before they are passed to ax.fill_between
, the negative values (which would be clipped) are still present. Thus I had assumed that the "unclipping" behaviour must be something matplotlib is doing during the call to ax.fill_between
, since _LinePlotter.plot
appears to do no other relevant post-transformations of any data before it returns, and lineplot
returns immediately.
However, consider a small example that calls fill_between
where some of the lower bounds are negative:
import numpy as np
fig, ax = plt.subplots(1, 1, figsize=(5, 5))
np.random.seed(5678)
ax.fill_between(
np.arange(10),
np.random.random((10,)) - 0.2,
np.random.random((10,)) + 0.75,
)
ax.set_yscale('log')
Then it makes no difference if ax.set_yscale('log')
is called before ax.fill_between
; in both cases the result is this.
I've spent some time searching for answers about this in the Seaborn and matplotlib documentation, and looked for answers on SA and elsewhere, but I haven't found any information about what is going on here.
The answer is straightforward: the unclipped error bands can be reproduced by applying the (mean - std, mean + std)
calculation to the log-transformed data.