Search code examples
pythonpandasmatplotlibdatetimepython-datetime

How can I make the x-axis of my 2D histogram use dates while avoiding overflow errors from matplotlib axis formatting?


I am working with a set of monthly averaged time-series data that spans 20+ years and have put the data into a pandas dataframe. The index of the dataframe is composed of the datetime objects that span the time range of the dataset. I have successfully created a 2D histogram subplot of both time and another parameter, proton speed. The x-axis of the histogram was created by what seems like a default action, but I'm not sure how to interpret it. I have been trying to format the x-axis using matplotlib commands, primarily the date locator/formatter functions, but they keep throwing a massive overflow error that ends with: "OverflowError: int too big to convert."

I have not been successful in finding a good solution with other questions or through the documentation.

These are the imports I have used so far:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, date, time
import matplotlib.dates as mdates

The following is the pandas dataframe that I have been using. I apologize if the formatting is weird. I wasn't sure how to share the table, so I copied the dataframe directly from my notebook. The columns should be tab delimited here.

Datetime  proton_density proton_temp  He4toprotons proton_speed x_dot_RTN Proton Mass Flux
----------------------------------------------------------------------------------------                    
1998-01-23  11.625  58930.0 0.0224  380.90  379.91  7.406307e-19
1998-02-19  9.569   64302.0 0.0294  380.99  380.23  6.097867e-19
1998-03-18  8.767   66770.0 0.0348  384.00  383.19  5.630929e-19
1998-04-14  7.410   121090.0    0.0352  448.44  446.58  5.558023e-19
1998-05-11  7.881   102230.0    0.0271  421.21  419.87  5.552362e-19
... ... ... ... ... ... ...
2021-09-19  8.244   55183.0 0.0356  384.52  383.22  5.302183e-19
2021-10-16  9.664   70601.0 0.0115  418.50  416.21  6.764725e-19
2021-11-12  6.137   93617.0 0.0256  450.47  449.30  4.624021e-19
2021-12-09  4.889   96768.0 0.0177  426.52  424.99  3.487845e-19
2022-01-05  7.280   85944.0 0.0310  434.17  433.01  5.286752e-19

Here is the code I have used to make my histogram:

ax_example = plt.subplot2grid((3, 6), (2, 1), colspan = 2)

H,xedges,yedges = np.histogram2d(SWEPAM_dataframe.index, SWEPAM_dataframe.proton_speed, bins=[50,50])
ax_example.pcolor(xedges, yedges, H.T)
ax_example.set_xlabel("Year")
ax_example.set_ylabel("Proton Speed (km/s)")

The result was this:

enter image description here

As you can see, the x-axis is not in datetime by default, it seems. I'm not actually sure how to interpret the default x-axis values, but that's not as important here. I have found that I should be using some combination of ax2.xaxis.set_major_locator(loc) and ax2.xaxis.set_major_formatter(fmt). However, anytime I try to use these commands I get the aforementioned overflow error and am prevented from turning the x-axis of my histogram into the desired dates.


Solution

  • I could reproduce your issue. The question is why xedges returns such high numbers (in the 10^17) and that has to see with how matplotlib reads datetime objects, in what unit of time since epoch.

    I have been trying to make it function reliably to provide a full answer.

    Also this overflow error was already reported in Set xaxis data to datetime in matplotlib without receiving a convincing answer.


    Alternatively, seaborn is better than matplotlib at handling the datetime dtype in pandas dataframes without requiring further manipulations on the axes:

    import seaborn as sns
    
    # with input: (without setting `"Datetime"` as index)
    df = pd.DataFrame(columns = ['Datetime','proton_density','proton_temp','He4toprotons','proton_speed','x_dot_RTN','Proton_Mass_Flux'],
                      data = [['1998-01-23',11.625,58930.0,0.0224,380.90,379.91,7.406307e-19],
                              ['1998-02-19', 9.569,64302.0,0.0294,380.99,380.23,6.097867e-19],
                              ['1998-03-18', 8.767,66770.0,0.0348,384.00,383.19,5.630929e-19],
                              ['1998-04-14',7.410,121090.0,0.0352,448.44,446.58,5.558023e-19],
                              ['1998-05-11',7.881,102230.0,0.0271,421.21,419.87,5.552362e-19],
                              ['2021-09-19', 8.244,55183.0,0.0356,384.52,383.22,5.302183e-19],
                              ['2021-10-16', 9.664,70601.0,0.0115,418.50,416.21,6.764725e-19],
                              ['2021-11-12', 6.137,93617.0,0.0256,450.47,449.30,4.624021e-19],
                              ['2021-12-09', 4.889,96768.0,0.0177,426.52,424.99,3.487845e-19],
                              ['2022-01-05', 7.280,85944.0,0.0310,434.17,433.01,5.286752e-19]])
    df['Datetime'] = pd.to_datetime(df['Datetime'])
    

    This will then produce the expected 2D histogramm and axes labels:

    sns.histplot(df, x="Datetime", y="proton_speed")