Search code examples
pythonpandasdataframemulti-indexpython-datetime

Resample by Date in Pandas — messes up a date in index


I have a multi-index dataFrame in Pandas, with data indexed by building, and then by date. The different columns represent different kinds of energy, and the values represent how much energy was used for a given month. Image of the dataframe's head is here. I'd like to turn this into yearly data. I currently have the line

df.unstack(level=0).resample('BAS-JUL').sum()

and this works almost perfectly. Here is the issue: all the dates are given as the 1st of the month, but for some reason, as it does resample, it picks July 2nd as the cut-off for 2012. So the number for July 1, 2012 ends up being counted in the 2011 data. It ends up looking like this. You can see that the second value in the Usage Month column is July 2. Other than that, the resample appears to work perfectly.

If I run df.index.get_level_values(1)[:20], the output is:

DatetimeIndex(['2011-07-01', '2011-08-01', '2011-09-01', '2011-10-01',
           '2011-11-01', '2011-12-01', '2012-01-01', '2012-02-01',
           '2012-03-01', '2012-04-01', '2012-05-01', '2012-06-01',
           '2012-07-01', '2012-08-01', '2012-09-01', '2012-10-01',
           '2012-11-01', '2012-12-01', '2013-01-01', '2013-02-01'],
          dtype='datetime64[ns]', name='Usage Month', freq=None)

So the index is July 1 2012 in the original dataframe.

Any ideas on how to fix this mini-bug would be appreciated!


Solution

  • Use 'AS-JUL':

    df.unstack(level=0).resample('AS-JUL').sum()
    

    The B is for Business Annual Start.