I have a pandas series with index as datetime which I am trying to visualize,
using bar graph. My code is below. But the chart I am getting is not quite accurate (pic below) it seems. How do I fix this?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(100)
dti = pd.date_range('2012-12-31', periods=30, freq='Q')
s2 = pd.Series(np.random.randint(100,1000,size=(30)),index=dti)
df4 = s2.to_frame(name='count')
print('\ndf4:')
print(df4)
print(type(df4))
f2 = plt.figure("Quarterly",figsize=(10,5))
ax = plt.subplot(1,1,1)
ax.bar(df4.index,df4['count'])
plt.tight_layout()
plt.show()
Unfortunately, matplotlib's bar plots don't seem to play along very happily with pandas dates.
In theory, matplotlib expresses the bar widths in days. But if you try something like ax.bar(df4.index,df4['count'], width=30)
, you'll see the plot with extremely wide bars, almost completely filling the plot. Experimenting with the width
, something weird happens. When width
is smaller than 2, it looks like it is expressed in days. But with the width
larger than 2, it suddenly jumps to something much wider.
On my system (matplotlib 3.1.2, pandas 0.25.3, Windows) it looks like:
A workaround uses the bar plots from pandas. These seem to make the bars categorical, with one tick per bar. But they get labelled with a full date including hours, minutes and seconds. You could relabel them, for example like:
df4.plot.bar(y='count', width=0.9, ax=ax)
plt.xticks(range(len(df4.index)),
[t.to_pydatetime().strftime("%b '%y") for t in df4.index],
rotation=90)
Investigating further, the inconsistent jumping around of matplotlib's bar width, seems related to the frequency
build into pandas times. So, a solution could be to convert the dates to matplotlib dates. Trying this, yes, the widths get expressed consistently in days.
Unfortunately, the quarterly dates don't have exactly the same number of days between them, resulting in some bars too wide, and others too narrow. A solution to this next problem is explicitly calculating the number of days for each bar. In order to get nice separations between the bars, it helps to draw their edges in white.
from datetime import datetime
x = [datetime.date(t) for t in df4.index] # convert the pandas datetime to matplotlib's
widths = [t1-t0 for t0, t1 in zip(x, x[1:])] # time differences between dates
widths += [widths[-1]] # the very last bar didn't get a width, just repeat the last width
ax.bar(x, df4['count'], width=widths, edgecolor='white')