Search code examples
pythonpandasdataframekernel-density

Python, Pandas: How to change the bandwidth selection for DataFrame.plt.density()?


I have some data i have placed into a pandas dataframe, and I plotted a bar plot of the unique value counts for a particular column.

I would like to control the bandwidth of the Pandas built-in df.plot.density()

Function, which plots the kde over the data. Is this possible, or am I better off with Sklearn, Scipy, or something else?

Thanks


Solution

  • As pointed out by @Jan, you could use seaborn for this, it's pretty easy to control the bandwidth on a kde plot. Here is an example with random normal data:

    import seaborn as sns
    
    d = pd.DataFrame({'x':np.random.choice(['a','b','c'], 100), 'y':np.random.randn(100)})
    
    fig, axes = plt.subplots(1,3)
    for name,g in d.groupby('x'):
        g['y'].plot.density(ax=axes[0], label=name)
        sns.kdeplot(g['y'], bw=0.25, ax=axes[1], label=name)
        sns.kdeplot(g['y'], bw=0.75, ax=axes[2], label=name)
    
    axes[0].set_title('pandas plot.density', fontsize='12')
    axes[1].set_title('seaborn kde with \n 0.25 bandwidth', fontsize='12')
    axes[2].set_title('seaborn kde with \n 0.75 bandwidth', fontsize='12')
    
    plt.legend()
    

    This returns the following plot to compare:

    enter image description here