Search code examples
pythonpandasdatetimeindex

Split date index to yearly index and monthly columns


I have a DataFrame like this:

data_date   value
2016-01-01  1
2016-01-02  2
2017-02-05  3
2017-02-07  4
2017-03-09  5

I need to convert it to a table with years as the index and months as the columns. (aggregate using sum)

The final output should look like this

      Jan | Feb | Mar | Apr | ........... Dec |
2016  3   | xx  | xx  | xx  | ............    |
2017  xx  | 7   | 5   | xx  | ............    |

Here it what I did:

To make it easy to copy:

import pandas as pd
df=pd.DataFrame([
 {'data_date': '2016-01-01', 'value': 1},
 {'data_date': '2016-01-02', 'value': 2},
 {'data_date': '2017-02-05', 'value': 3},
 {'data_date': '2017-02-07', 'value': 4},
 {'data_date': '2017-03-09', 'value': 5}])

I used TimeGrouper to first aggregate it to monthly like so:

df['data_date'] = pd.to_datetime(df['data_date'])
df.set_index('data_date', inplace=True)
grp_df = df.groupby([pd.Grouper(freq='M')]).sum()

So now I have data aggregated to each row as a month/Yr. I'm stuck as to how to make months the columns and year the row.

Can you help me with that please?


Solution

  • Try pivot_table:

    (df.assign(year=df.data_date.dt.year, month=df.data_date.dt.strftime('%b'))
       .pivot_table(index='year', columns='month', values='value', aggfunc='sum')
       .reindex(['Jan','Feb','Mar','Dec'], axis=1)   # change this to correct month names
    )
    

    Or with pd.crosstab:

    pd.crosstab(
        index=df.data_date.dt.year,
        columns=df.data_date.dt.strftime('%b'),
        values=df['value'],
        aggfunc='sum'
    ).reindex(['Jan','Feb','Mar','Dec'], axis=1)
    

    Output:

    month  Jan  Feb  Mar  Dec
    year                     
    2016   3.0  NaN  NaN  NaN
    2017   NaN  7.0  5.0  NaN