Search code examples
pythonseabornhistogram

Seaborn kde plot plotting probabilities instead of density (histplot without bars)


I have a question about seaborn kdeplot. In histplot one can set up which stats they want to have (counts, frequency, density, probability) and if used with the kde argument, it also applies to the kdeplot. However, I have not found a way how to change it directly in the kdeplot if I wanted to have just the kde plot estimation with probabilities. Alternatively, the same result should be coming from histplot if the bars were possible to be switched off, which I also have not found. So how can one do that?

To give some visual example, I would like to have just the red curve, ie. either pass an argument to kdeplot to use probabilities, or to remove the bars from histplot:

import seaborn

penguins = sns.load_dataset("penguins")
sns.histplot(data=penguins, x="flipper_length_mm", kde=True, stat="probability", color="r", label="probabilities")
sns.kdeplot(data=penguins, x="flipper_length_mm", color="k", label="kde density")
plt.legend()

im1 - kde and hist plot

Thanks a lot.


Solution

  • The y-axis of a histplot with stat="probability" corresponds to the probability that a value belongs to a certain bar. The value of 0.23 for the highest bar, means that there is a probability of about 23% that a flipper length is between 189.7 and 195.6 mm (being the edges of that specific bin). Note that by default, 10 bins are spread out between the minimum and maximum value encountered.

    The y-axis of a kdeplot is similar to a probability density function. The height of the curve is proportional to the approximate probability of a value being within a bin of width 1 of the corresponding x-value. A value of 0.031 for x=191 means there is a probability of about 3.1 % that the length is between 190.5 and 191.5.

    Now, to directly get probability values next to a kdeplot, first a bin width needs to be chosen. Then the y-values can be divided by that bin with to correspond to an x-value being within a bin of that width. The PercentageFormatter provides a way to set such a correspondence, using ax.yaxis.set_major_formatter(PercentFormatter(1/binwidth)).

    The code below illustrates an example with a binwidth of 5 mm, and how a histplot can match a kdeplot.

    import matplotlib.pyplot as plt
    import seaborn as sns
    from matplotlib.ticker import PercentFormatter
    
    fig, ax1 = plt.subplots()
    penguins = sns.load_dataset("penguins")
    binwidth = 5
    sns.histplot(data=penguins, x="flipper_length_mm", kde=True, stat="probability", color="r", label="Probabilities",
                 binwidth=binwidth, ax=ax1)
    ax2 = ax1.twinx()
    sns.kdeplot(data=penguins, x="flipper_length_mm", color="k", label="kde density", ls=':', lw=5, ax=ax2)
    ax2.set_ylim(0, ax1.get_ylim()[1] / binwidth)  # similir limits on the y-axis to align the plots
    ax2.yaxis.set_major_formatter(PercentFormatter(1 / binwidth))  # show axis such that 1/binwidth corresponds to 100%
    ax2.set_ylabel(f'Probability for a bin width of {binwidth}')
    ax1.legend(loc='upper left')
    ax2.legend(loc='upper right')
    plt.show()
    

    example plot

    PS: To only show the kdeplot with a probability, the code could be:

    binwidth = 5
    ax = sns.kdeplot(data=penguins, x="flipper_length_mm")
    ax.yaxis.set_major_formatter(PercentFormatter(1 / binwidth))  # show axis such that 1/binwidth corresponds to 100%
    ax.set_ylabel(f'Probability for a bin width of {binwidth}')
    

    Another option could be to draw a histplot with kde=True, and remove the generated bars. To be interpretable, a binwidth should be set. With binwidth=1 you'd get the same y-axis as a density plot. (kde_kws={'cut': 3}) lets the kde smoothly go to about zero, default the kde curve is cut off with the minimum and maximum of the data).

    ax = sns.histplot(data=penguins, x="flipper_length_mm", binwidth=1, kde=True, stat='probability', kde_kws={'cut': 3})
    ax.containers[0].remove() # remove the bars
    ax.relim() # the axis limits need to be recalculated without the bars
    ax.autoscale_view()