I have the following dataframe (except my actual data is over 25 years):
import pandas as pd
df = pd.DataFrame(
dict(
date=pd.date_range(start="2020-01-01", end="2020-12-31", freq="MS"),
data=[1,2,3,4,5,6,7,8,9,10,11,12]
),
)
df
Output:
date data
0 2020-01-01 1
1 2020-02-01 2
2 2020-03-01 3
3 2020-04-01 4
4 2020-05-01 5
5 2020-06-01 6
6 2020-07-01 7
7 2020-08-01 8
8 2020-09-01 9
9 2020-10-01 10
10 2020-11-01 11
11 2020-12-01 12
And I get different results with matplotlib and pandas default plotting:
import matplotlib as mpl
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
fig = mpl.figure.Figure(constrained_layout=True)
axs = fig.subplot_mosaic("ac;bd")
ax = axs["a"]
ax.bar(x="date", height="data", data=df, width=15)
ax = axs["b"]
ax.bar(x="date", height="data", data=df, width=15)
locator = mdates.AutoDateLocator(minticks=12, maxticks=24)
formatter = mdates.ConciseDateFormatter(locator)
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)
ax = axs["c"]
df.plot.bar(x="date", y="data", ax=ax, legend=False)
ax = axs["d"]
df.plot.bar(x="date", y="data", ax=ax, legend=False, ) # incorrect year -> 1970 instead of 2020
locator = mdates.AutoDateLocator(minticks=12, maxticks=24)
formatter = mdates.ConciseDateFormatter(locator)
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)
for k, ax in axs.items():
for label in ax.get_xticklabels():
label.set_rotation(40)
label.set_horizontalalignment('right')
fig
Output:
I would like to be able to use pandas for plotting but then format the ticks appropriately for a publication ready plot. However, it appears that I lose the date time information or get the incorrect year when using pandas.
Is there a way to format the axis ticklabels using mdates
features without using the data directly? i.e. if I resample the data, or slice in a different year, I'd like the axis to reflect that automatically.
Here's a more simple illustration of the issue I'm having:
import matplotlib as mpl
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
fig = mpl.figure.Figure(constrained_layout=True)
axs = fig.subplot_mosaic("a")
ax = axs["a"]
df.plot.bar(x="date", y="data", ax=ax, legend=False) # incorrect year -> 1970 instead of 2020
formatter = mdates.DateFormatter("%Y - %b")
ax.xaxis.set_major_formatter(formatter)
fig
The dates are all wrong when using DateFormatter
.
When you are using a bar plot, the x-coordinates become 0, 1, 2, 3, etc. That's why mdates.DateFormatter
returns 1970, as it treats these coordinates as seconds since epoch time.
You can set the tick labels manually:
ax.set_xticklabels(df["date"].dt.strftime("%Y - %b"))