I have data that I am visualizing that is categorized by time block. The column looks something like this:
I ultimately want to get to a point where I can order my data by this time block on the x axis of my chart, while plotting the corresponding value on the y. Something like this:
The issue occurs when pandas or altair (my visualization library) tries to order these time blocks, inherently putting the 5:00 pm slot after the 5:00 am slot etc...
I've worked around this in the past by creating a sperate 'Order' column and assigning the row a specific order based on the time block. Something like this
# I would create a dict like this:
daypartsdict = {'11:00 pm - 11:30 pm': 7,
'11:30 pm - 01:00 am': 8,
'12:00 pm - 03:00 pm': 3,
'03:00 pm - 05:00 pm': 4,
'05:00 am - 09:00 am': 1,
'05:00 pm - 07:00 pm': 5,
'07:00 pm - 11:00 pm': 6,
'09:00 am - 12:00 pm': 2,
}
# Create a new column using that dict:
aggdf['Order'] = aggdf['Time'].apply(lambda x: daypartsdict[x])
# And then use the order column as a field in altair to visualize
alt.Chart(data).mark_line(point=True).encode(
x = alt.X(field = 'Time', sort = alt.Sort(field = 'Order')),
y='RTG',
color='Station'
Resulting in something like:
But with over 80 time blocks in the 15minute data, this method seems silly. I'm curious if there is a pandas function or method I could use to make this process more efficient. Open to any and all suggestions on how to improve this!
My suggestion is not the perfect one, I didn't find the way to use your time blocks as axis labels. But it could be a starting point) So here is my suggestion:
data['start_block']=data['time_block'].str.split(" - ", expand=True)[0]
data['start_block']= pd.to_datetime(data['start_block'], format='%I:%M%p')
alt.Chart(data).mark_line(point=True).encode(
x=alt.X('start_block:T',title="Time"),
y='blocks'
)
so you will have something like that.