Search code examples
pythonmatplotlib

Matplotlib bar graph incoherent behavior when using bottom and height parameters


I am trying to plot a bar chart where for each day, used as X axis, we see the activity between different periods of time as a bar going from the time of the start of the activity to the time of the end of the activity. So Y axis goes from 0 to 24, and for example I could have a bar from 1AM to 2AM and a second bar from 3PM and 5PM.

I have used matplotlib bar graph and used the bottom and height parameters to make this work, and to a certain extent, that does work. When I have very little data, everything is displayed correctly, but when I have dozens of activities, somehow the data turns wrong.

The code is the following (taken from Power BI and I've never used Pandas, so I just converted straight to Python array)

import os, uuid, matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot
import pandas
import datetime
import matplotlib.ticker
import matplotlib.dates
import numpy

dataset = pandas.read_csv('input_df_3a3333a0-fd5d-4630-8707-5fc23cb0b326.csv')

matplotlib.pyplot.figure(figsize=(5.55555555555556,4.16666666666667), dpi=72)

test_date_time = dataset.to_numpy().tolist()
test_date_time = list(filter(lambda x: type(x[0]) is not float, test_date_time))
test_date_time = [(datetime.datetime.fromisoformat(x[0]), datetime.datetime.fromisoformat(x[1])) for x in test_date_time]
test_date_time = sorted(test_date_time, key=lambda x:x[0])

values = {"days": [], "bottom": [], "height": []}


# ### KIND OF WORKING
for (test_start, test_end) in test_date_time:
    values["days"].append(test_start.date())
    values["bottom"].append(test_start.time().hour + (test_start.time().minute / 60) + (test_start.time().second / 3600))
    values["height"].append((test_end - test_start).total_seconds() / 3600)
matplotlib.pyplot.bar(x=values["days"], height=values["height"], bottom=values["bottom"])


matplotlib.pyplot.show()

I have checked the values dictionnary and that looks pretty good to me. But when I plot it, I often get values far exceeding 24, like this for example :

Bar graph showing the issue

Problem is, if I check through the debugguer, the last height value for April 4th is 1.44 hours and the bottom value is 19.01 hours, so I should have a bar going from 19.01 to 20.45 and that's it, which is not at all what I get.

I have looked at Timeline bar graph using python and matplotlib but I'm just curious at why is this happening ? Example data can be found here https://filebin.net/rivcvi6d9v92sywk


Solution

  • The reason is that you have an outlier on index 513 (lasts from 04/04 to 04/06):

    2024-04-04T15:02:44.0000000,2024-04-06T14:57:45.0000000