Search code examples
pythonpandasmatplotlibtime-serieskernel-density

How to plot kernel density plot of dates in Pandas?


I have a pandas dataframe where each observation has a date (as a column of entries in datetime[64] format). These dates are spread over a period of about 5 years. I would like to plot a kernel-density plot of the dates of all the observations, with the years labelled on the x-axis.

I have figured out how to create a time-delta relative to some reference date and then create a density plot of the number of hours/days/years between each observation and the reference date:

df['relativeDate'].astype('timedelta64[D]').plot(kind='kde')

But this isn't exactly what I want: If I convert to year-deltas, then the x-axis is right but I lose the within-year variation. But if I take a smaller unit of time like hour or day, the x-axis labels are much harder to interpret.

What's the simplest way to make this work in Pandas?


Solution

  • Inspired by @JohnE 's answer, an alternative approach to convert date to numeric value is to use .toordinal().

    import pandas as pd
    import numpy as np
    
    # simulate some artificial data
    # ===============================
    np.random.seed(0)
    dates = pd.date_range('2010-01-01', periods=31, freq='D')
    df = pd.DataFrame(np.random.choice(dates,100), columns=['dates'])
    # use toordinal() to get datenum
    df['ordinal'] = [x.toordinal() for x in df.dates]
    
    print(df)
    
            dates  ordinal
    0  2010-01-13   733785
    1  2010-01-16   733788
    2  2010-01-22   733794
    3  2010-01-01   733773
    4  2010-01-04   733776
    5  2010-01-28   733800
    6  2010-01-04   733776
    7  2010-01-08   733780
    8  2010-01-10   733782
    9  2010-01-20   733792
    ..        ...      ...
    90 2010-01-19   733791
    91 2010-01-28   733800
    92 2010-01-01   733773
    93 2010-01-15   733787
    94 2010-01-04   733776
    95 2010-01-22   733794
    96 2010-01-13   733785
    97 2010-01-26   733798
    98 2010-01-11   733783
    99 2010-01-21   733793
    
    [100 rows x 2 columns]    
    
    # plot non-parametric kde on numeric datenum
    ax = df['ordinal'].plot(kind='kde')
    # rename the xticks with labels
    x_ticks = ax.get_xticks()
    ax.set_xticks(x_ticks[::2])
    xlabels = [datetime.datetime.fromordinal(int(x)).strftime('%Y-%m-%d') for x in x_ticks[::2]]
    ax.set_xticklabels(xlabels)
    

    enter image description here