Search code examples
pythonpandasdatetimealtairx-axis

Ordering by time block - Pandas and Altair


I have data that I am visualizing that is categorized by time block. The column looks something like this:

time column

I ultimately want to get to a point where I can order my data by this time block on the x axis of my chart, while plotting the corresponding value on the y. Something like this:

example chart

The issue occurs when pandas or altair (my visualization library) tries to order these time blocks, inherently putting the 5:00 pm slot after the 5:00 am slot etc...

I've worked around this in the past by creating a sperate 'Order' column and assigning the row a specific order based on the time block. Something like this

# I would create a dict like this:
daypartsdict = {'11:00 pm - 11:30 pm': 7,
'11:30 pm - 01:00 am': 8,
 '12:00 pm - 03:00 pm': 3,
 '03:00 pm - 05:00 pm': 4,
 '05:00 am - 09:00 am': 1,
 '05:00 pm - 07:00 pm': 5,
 '07:00 pm - 11:00 pm': 6,
 '09:00 am - 12:00 pm': 2,
 
 }

# Create a new column using that dict: 
aggdf['Order'] = aggdf['Time'].apply(lambda x: daypartsdict[x])


# And then use the order column as a field in altair to visualize 

alt.Chart(data).mark_line(point=True).encode(
   x = alt.X(field = 'Time', sort = alt.Sort(field = 'Order')),
   y='RTG',
   color='Station'

Resulting in something like:

sample axis

But with over 80 time blocks in the 15minute data, this method seems silly. I'm curious if there is a pandas function or method I could use to make this process more efficient. Open to any and all suggestions on how to improve this!


Solution

  • My suggestion is not the perfect one, I didn't find the way to use your time blocks as axis labels. But it could be a starting point) So here is my suggestion:

    1. create a separate column with start of your time blocks using split function and transforming this column to a datetime type
    data['start_block']=data['time_block'].str.split(" - ", expand=True)[0]
    data['start_block']= pd.to_datetime(data['start_block'], format='%I:%M%p')
    
    1. use time for axis encoding
    alt.Chart(data).mark_line(point=True).encode(
        x=alt.X('start_block:T',title="Time"),
        y='blocks'
    )
    

    so you will have something like that.

    enter image description here