Search code examples
pandasmatplotlibpython-datetime

Inconsistent internal representation of dates in matplotlib/pandas


import pandas as pd

index = pd.to_datetime(['2016-05-01', '2016-11-01', '2017-05-02'])
data = pd.DataFrame({'a': [1, 2, 3],
                     'b': [4, 5, 6]}, index=index)
ax = data.plot()
print(ax.get_xlim())

# Out: (736066.7, 736469.3)

Now, if we change the last date.

index = pd.to_datetime(['2016-05-01', '2016-11-01', '2017-05-01'])
data = pd.DataFrame({'a': [1, 2, 3],
                     'b': [4, 5, 6]}, index=index)
ax = data.plot()
print(ax.get_xlim())

# Out: (184.8, 189.2)

The first example seems consistent with the matplotlib docs:

Matplotlib represents dates using floating point numbers specifying the number of days since 0001-01-01 UTC, plus 1

Why does the second example return something seemingly completely different? I'm using pandas version 0.22.0 and matplotlib version 2.2.2.


Solution

  • Pandas uses different units to represents dates and times on the axes, depending on the range of dates/times in use. This means that different locators are in use.

    In the first case,

    print(ax.xaxis.get_major_locator())
    # Out: pandas.plotting._converter.PandasAutoDateLocator
    

    in the second case

    print(ax.xaxis.get_major_locator())
    # pandas.plotting._converter.TimeSeries_DateLocator
    

    You may force pandas to always use the PandasAutoDateLocator using the x_compat argument,

    df.plot(x_compat=True)
    

    This would ensure to always get the same datetime definition, consistent with the matplotlib.dates convention.

    The drawback is that this removes the nice quarterly ticking

    enter image description here

    and replaces it with the standard ticking

    enter image description here

    On the other hand it would then allow to use the very customizable matplotlib.dates tickers and formatters. For example to get quarterly ticks/labels

    import matplotlib.pyplot as plt
    import matplotlib.dates as mdates
    import matplotlib.ticker as mticker
    import pandas as pd
    
    index = pd.to_datetime(['2016-05-01', '2016-11-01', '2017-05-01'])
    data = pd.DataFrame({'a': [1, 2, 3],
                         'b': [4, 5, 6]}, index=index)
    ax = data.plot(x_compat=True)
    
    # Quarterly ticks
    ax.xaxis.set_major_locator(mdates.MonthLocator((1,4,7,10)))
    
    # Formatting:
    def func(x,pos):
        q = (mdates.num2date(x).month-1)//3+1
        tx = "Q{}".format(q)
        if q == 1:
            tx += "\n{}".format(mdates.num2date(x).year)
        return tx
    ax.xaxis.set_major_formatter(mticker.FuncFormatter(func))
    plt.setp(ax.get_xticklabels(), rotation=0, ha="center")
    
    plt.show()
    

    enter image description here