python statistics seaborn histogram kernel-density

How to Generate Two Separate Y-Axes For A Histogram on the Same Figure In Seaborn

I'd like to generate a single figure that has two y axes: Count (from the histogram) and Density (from the KDE).

I want to use sns.displot in Seaborn >= v 0.11.

import seaborn as sns

df = sns.load_dataset('tips')

# graph 1: This should be the Y-Axis on the left side of the figure
sns.displot(df['total_bill'], kind='hist', bins=10)

# graph 2: This should be the Y-axis on the right side of the figure
sns.displot(df['total_bill'], kind='kde')

The code I've written generates two separate graphs; I could just use a facet grid for two separate graphs, but I want to be more concise and place the two y-axes on the two separate grids into a single figure sharing the same x-axis.

Solution

displot() is a figure-level function, which can create multiple subplots inside a figure. As such, you don't have control over individual axes.

To create combined plots, you can use the underlying axes-level functions: histplot() and kdeplot() for Seaborn v.0.11. These functions accept an ax= parameter. twinx() creates a second y-axis.

import matplotlib.pyplot as plt
import seaborn as sns

df = sns.load_dataset('tips')

fig, ax = plt.subplots()

sns.histplot(df['total_bill'], bins=10, ax=ax)

ax2 = ax.twinx()
sns.kdeplot(df['total_bill'], ax=ax2)

plt.tight_layout()
plt.show()

Edit:

As mentioned in the comments, the y-axes aren't aligned. The left axis only tells something about the histogram. E.g. the highest bin having height 68 means that there are exactly 68 total bills between 12.618 and 17.392. The right axis only tells something about the kde. E.g. a y-value of 0.043 for x=20 would mean there is about 4.3 % probability that the total bill would be between 19.5 and 20.5.

To align both similar to sns.histplot(..., kde=True), the area of the histogram can be calculated (bin width times number of data values) and used as a scaling factor. Such scaling would make the area of the histogram and the area below the kde curve equal when measured in pixels:

num_bins = 10
bin_width = (df['total_bill'].max() - df['total_bill'].min()) / num_bins
hist_area = len(df) * bin_width
ax2.set_ylim(ymax=ax.get_ylim()[1] / hist_area)

Note that the right axis would be more similar to a percentage if the histogram would use a bin width with a power of ten (e.g. sns.histplot(..., bins=np.arange(0, df['total_bill'].max()+10, 10)). Which bins would be most suitable strongly depends on how you want to interpret your data.