I am trying to plot a bar chart where for each day, used as X axis, we see the activity between different periods of time as a bar going from the time of the start of the activity to the time of the end of the activity. So Y axis goes from 0 to 24, and for example I could have a bar from 1AM to 2AM and a second bar from 3PM and 5PM.
I have used matplotlib bar graph and used the bottom and height parameters to make this work, and to a certain extent, that does work. When I have very little data, everything is displayed correctly, but when I have dozens of activities, somehow the data turns wrong.
The code is the following (taken from Power BI and I've never used Pandas, so I just converted straight to Python array)
import os, uuid, matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot
import pandas
import datetime
import matplotlib.ticker
import matplotlib.dates
import numpy
dataset = pandas.read_csv('input_df_3a3333a0-fd5d-4630-8707-5fc23cb0b326.csv')
matplotlib.pyplot.figure(figsize=(5.55555555555556,4.16666666666667), dpi=72)
test_date_time = dataset.to_numpy().tolist()
test_date_time = list(filter(lambda x: type(x[0]) is not float, test_date_time))
test_date_time = [(datetime.datetime.fromisoformat(x[0]), datetime.datetime.fromisoformat(x[1])) for x in test_date_time]
test_date_time = sorted(test_date_time, key=lambda x:x[0])
values = {"days": [], "bottom": [], "height": []}
# ### KIND OF WORKING
for (test_start, test_end) in test_date_time:
values["days"].append(test_start.date())
values["bottom"].append(test_start.time().hour + (test_start.time().minute / 60) + (test_start.time().second / 3600))
values["height"].append((test_end - test_start).total_seconds() / 3600)
matplotlib.pyplot.bar(x=values["days"], height=values["height"], bottom=values["bottom"])
matplotlib.pyplot.show()
I have checked the values dictionnary and that looks pretty good to me. But when I plot it, I often get values far exceeding 24, like this for example :
Problem is, if I check through the debugguer, the last height value for April 4th is 1.44 hours and the bottom value is 19.01 hours, so I should have a bar going from 19.01 to 20.45 and that's it, which is not at all what I get.
I have looked at Timeline bar graph using python and matplotlib but I'm just curious at why is this happening ? Example data can be found here https://filebin.net/rivcvi6d9v92sywk
The reason is that you have an outlier on index 513 (lasts from 04/04 to 04/06):
2024-04-04T15:02:44.0000000,2024-04-06T14:57:45.0000000