Search code examples
pythonpandasdatetimeseabornkdeplot

Plotting time on x-axis


I'm working with a dataset that only contains datetime objects and I have retrieved the day of the week and reformatted the time in a separate column like this (conversion functions included below):

    datetime            day_of_week time_of_day
0   2021-06-13 12:56:16 Sunday      20:00:00
5   2021-06-13 12:56:54 Sunday      20:00:00
6   2021-06-13 12:57:27 Sunday      20:00:00
7   2021-07-16 18:55:42 Friday      20:00:00
8   2021-07-16 18:56:03 Friday      20:00:00
9   2021-06-04 18:42:06 Friday      20:00:00
10  2021-06-04 18:49:05 Friday      20:00:00
11  2021-06-04 18:58:22 Friday      20:00:00

What I would like to do is create a kde plot with x-axis = time_of_day (spanning 00:00:00 to 23:59:59), y-axis to be the count of each day_of_week at each hour of the day, and hue = day_of_week. In essence, I'd have seven different distributions representing occurrences during each day of the week.

Here's a sample of the data and my code. Any help would be appreciated:

df = pd.DataFrame([
    '2021-06-13 12:56:16',
    '2021-06-13 12:56:16',
    '2021-06-13 12:56:16',
    '2021-06-13 12:56:16',
    '2021-06-13 12:56:54',
    '2021-06-13 12:56:54',
    '2021-06-13 12:57:27',
    '2021-07-16 18:55:42',
    '2021-07-16 18:56:03',
    '2021-06-04 18:42:06',
    '2021-06-04 18:49:05',
    '2021-06-04 18:58:22',
    '2021-06-08 21:31:44',
    '2021-06-09 02:14:30',
    '2021-06-09 02:20:19',
    '2021-06-12 18:05:47',
    '2021-06-15 23:46:41',
    '2021-06-15 23:47:18',
    '2021-06-16 14:19:08',
    '2021-06-17 19:08:17',
    '2021-06-17 22:37:27',
    '2021-06-21 23:31:32',
    '2021-06-23 20:32:09',
    '2021-06-24 16:04:21',
    '2020-05-22 18:29:02',
    '2020-05-22 18:29:02',
    '2020-05-22 18:29:02',
    '2020-05-22 18:29:02',
    '2020-08-31 21:38:07',
    '2020-08-31 21:38:22',
    '2020-08-31 21:38:42',
    '2020-08-31 21:39:03',
], columns=['datetime'])

def convert_date(date):
    return calendar.day_name[date.weekday()]

def convert_hour(time):
    return time[:2]+':00:00'

df['day_of_week'] = pd.to_datetime(df['datetime']).apply(convert_date)
df['time_of_day'] = df['datetime'].astype(str).apply(convert_hour)

Solution

  • Let's try:

    1. converting the datetime column to_datetime
    2. Create a Categorical column from day_of_week codes (so categorical ordering functions correctly)
    3. normalizing the time_of_day to a single day (so comparisons function correctly). This makes it seem like all events occurred within the same day making plotting logic much simpler.
    4. plot the kdeplot
    5. set the xaxis formatter to only display HH:MM:SS
    import calendar
    
    import pandas as pd
    import seaborn as sns
    from matplotlib import pyplot as plt, dates as mdates
    
    
    # df = pd.DataFrame({...})
    
    # Convert to datetime
    df['datetime'] = pd.to_datetime(df['datetime'])
    # Create Categorical Column
    cat_type = pd.CategoricalDtype(list(calendar.day_name), ordered=True)
    df['day_of_week'] = pd.Categorical.from_codes(
        df['datetime'].dt.day_of_week, dtype=cat_type
    )
    # Create Normalized Date Column
    df['time_of_day'] = pd.to_datetime('2000-01-01 ' +
                                       df['datetime'].dt.time.astype(str))
    
    # Plot
    ax = sns.kdeplot(data=df, x='time_of_day', hue='day_of_week')
    
    # X axis format
    ax.set_xlim([pd.to_datetime('2000-01-01 00:00:00'),
                 pd.to_datetime('2000-01-01 23:59:59')])
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))
    
    plt.tight_layout()
    plt.show()
    

    Note sample size is small here: kdeplot

    If looking for count on y then maybe histplot is better:

    ax = sns.histplot(data=df, x='time_of_day', hue='day_of_week')
    

    histplot