Search code examples
pythonmatplotlibseaborn

How are negative errorbar bounds transformed, when log axis scaling is applied before constructing a Seaborn lineplot?


I am plotting some lines with Seaborn:


import matplotlib.pyplot as plt 
import seaborn as sns
import pandas as pd

# dfs: dict[str, DataFrame]

fig, ax = plt.subplots()

for label, df in dfs.items():
    sns.lineplot(
        data=df,
        x="Iteration",
        y="Loss",
        errorbar="sd",
        label=label,
        ax=ax,
    )

ax.set(xscale='log', yscale='log')

The result looks like this.

Note the clipped negative values in the lower error band of the "effector_final_velocity" curve, since the standard deviation of the loss is larger than its mean, for those iterations.

However, if ax.set(xscale='log', yscale='log') is called before the looped calls to sns.lineplot, the result looks like this

I'm not sure where the unclipped values are arising.

Looking at the source of seaborn.relational: at the end of lineplot, the plot method of a _LinePlotter instance is called. It plots the error bands by passing the already-computed standard deviation bounds to ax.fill_between.

Inspecting the values of these bounds right before they are passed to ax.fill_between, the negative values (which would be clipped) are still present. Thus I had assumed that the "unclipping" behaviour must be something matplotlib is doing during the call to ax.fill_between, since _LinePlotter.plot appears to do no other relevant post-transformations of any data before it returns, and lineplot returns immediately.

However, consider a small example that calls fill_between where some of the lower bounds are negative:

import numpy as np 

fig, ax = plt.subplots(1, 1, figsize=(5, 5))

np.random.seed(5678)

ax.fill_between(
    np.arange(10), 
    np.random.random((10,)) - 0.2,
    np.random.random((10,)) + 0.75,
)

ax.set_yscale('log')

Then it makes no difference if ax.set_yscale('log') is called before ax.fill_between; in both cases the result is this.

I've spent some time searching for answers about this in the Seaborn and matplotlib documentation, and looked for answers on SA and elsewhere, but I haven't found any information about what is going on here.


Solution

  • The answer is straightforward: the unclipped error bands can be reproduced by applying the (mean - std, mean + std) calculation to the log-transformed data.