From a csv file having the following format:
Date,Data
01-01-01,111
02-02-02,222
03-03-03,333
I am calculating the monthly average of the values using the following code:
data = pd.read_csv("input.csv")
data['Month'] = pd.DatetimeIndex(data.reset_index()['Date']).month
mean_data = data.groupby('Month').mean()
Then I output a csv file using the following command:
mean_data.to_csv("test.csv")
It works fine and give me the following output:
Month,Data
01,01
02,02
03,03
04,04
...
But now I would like to know how many data have been included inside the monthly average calculation. For that I changed:
mean_data = data.groupby('Month').mean()
by:
mean_data = data.groupby(['Month']).agg(['mean', 'count'])
But the problem comes now. When I want to output the csv , I now have a weird format as follow:
Data,Data,
mean,count,
Month,
01, 01,8,
02, 02,9,
03, 03,7,
04, 04,5,
Which is not really convenient. Instead I would like to have the following output:
Month,Mean,Count
01,01,8
02,02,9
03,03,7
04,04,5
Does anyone know how to achieve that?
Need specify column after groupby
:
#convert first column to datetime
data = pd.read_csv("input.csv", parse_dates=[0])
df['Month'] = df['Date'].dt.month
mean_data = data.groupby('Month')['Data'].agg(['mean', 'count'])
should be simplify:
mean_data = data.groupby(df['Date'].dt.month)['Data'].agg(['mean', 'count'])