I have a DataFrame like this:
data_date value
2016-01-01 1
2016-01-02 2
2017-02-05 3
2017-02-07 4
2017-03-09 5
I need to convert it to a table with years as the index and months as the columns. (aggregate using sum)
The final output should look like this
Jan | Feb | Mar | Apr | ........... Dec |
2016 3 | xx | xx | xx | ............ |
2017 xx | 7 | 5 | xx | ............ |
Here it what I did:
To make it easy to copy:
import pandas as pd
df=pd.DataFrame([
{'data_date': '2016-01-01', 'value': 1},
{'data_date': '2016-01-02', 'value': 2},
{'data_date': '2017-02-05', 'value': 3},
{'data_date': '2017-02-07', 'value': 4},
{'data_date': '2017-03-09', 'value': 5}])
I used TimeGrouper to first aggregate it to monthly like so:
df['data_date'] = pd.to_datetime(df['data_date'])
df.set_index('data_date', inplace=True)
grp_df = df.groupby([pd.Grouper(freq='M')]).sum()
So now I have data aggregated to each row as a month/Yr. I'm stuck as to how to make months the columns and year the row.
Can you help me with that please?
Try pivot_table:
(df.assign(year=df.data_date.dt.year, month=df.data_date.dt.strftime('%b'))
.pivot_table(index='year', columns='month', values='value', aggfunc='sum')
.reindex(['Jan','Feb','Mar','Dec'], axis=1) # change this to correct month names
)
Or with pd.crosstab
:
pd.crosstab(
index=df.data_date.dt.year,
columns=df.data_date.dt.strftime('%b'),
values=df['value'],
aggfunc='sum'
).reindex(['Jan','Feb','Mar','Dec'], axis=1)
Output:
month Jan Feb Mar Dec
year
2016 3.0 NaN NaN NaN
2017 NaN 7.0 5.0 NaN