Search code examples
pythonseabornkernel-density

Getting the plot points for a kernel density estimate in seaborn


I am using this code

kde = sns.kdeplot(x = data, fill = True, color = "black", alpha = 0.1)

to get the kde for my data, and it works well. I am now trying to get all the x,y - point used to draw the plot and I am doing:

poly = kde.collections[0]
x_values = poly.get_paths()[0].vertices[:, 0]
y_values = poly.get_paths()[0].vertices[:, 1]

However, the x_values increase and then decrease. Why? I understand the y_values should increase and decrease, but I expect the x_values to be increasing since the curve is drawn from left to right. By the way, the values of the points is reasonable and matches the plot, except for this behaviour.


Solution

  • With fill=True, a filled polygon is created. ax.text() can be used to show the index of each point of the circumference. It seems first the points create the base of the polygon, and then go back following the upper part of the polygon.

    The code below shows the order of every 10th point, and uses ax, as the name kde might be a bit confusing for a subplot.

    import seaborn as sns
    import numpy as np
    
    ax = sns.kdeplot(x=np.random.randn(200).cumsum(), fill=True, color="black", alpha=0.1)
    
    poly = ax.collections[0]
    x_values = poly.get_paths()[0].vertices[:, 0]
    y_values = poly.get_paths()[0].vertices[:, 1]
    
    for i, (x, y) in enumerate(zip(x_values[::10], y_values[::10])):
        ax.text(x, y, i, ha='center', va='center', color='b')
    

    showing the index of the points of sns.kdeplot

    To get the points of just the curve, you could create the kdeplot with fill=False.

    import seaborn as sns
    import numpy as np
    
    data = np.random.randn(200).cumsum()
    ax = sns.kdeplot(x=data, fill=True, color="black", alpha=0.1)
    sns.kdeplot(x=data, fill=False, color="red", ax=ax) # temporarily draw a curve
    
    x_values, y_values = ax.lines[0].get_data() # get the coordinates of the curve
    ax.lines[0].remove()  # remove the curve again
    
    for i, (x, y) in enumerate(zip(x_values[::10], y_values[::10])):
        ax.text(x, y, i, ha='center', va='center', color='b')
    

    sns.kdeplot with numbered points of the curve

    It's much safer to use the curve instead of the polygon, as its direction and starting point might differ in future versions. The very first point of the polygon is probably the first point of the curve (to "close" the polygon). The curve is never fully zero. You can use the cut= parameter to extend the curve further into the almost-zero region. Note that the kde is just an approximation of the pdf. Its fidelity depends on how much the underlying distribution locally looks like a gaussian.