I'm trying to plot a chart that shows daily Max and Min temperature from the period of 2005-2014. The temperature is denoted in the Data_Value column.
I have a hard time trying to figure out how to proceed here. I need to show the x axis in months instead of years or days.
Dataset
ID Element Data_Value
Date
2005-01-01 USW00004848 TMIN 0
2005-01-01 USC00207320 TMAX 150
2005-01-01 USC00207320 TMIN -11
2005-01-01 USW00014833 TMIN -44
2005-01-01 USW00014833 TMAX 33
.....
2015-12-31 USC00200032 TMAX 11
2015-12-31 USC00205050 TMIN -17
2015-12-31 USC00208202 TMAX 0
2015-12-31 USC00201250 TMIN -6
2015-12-31 USC00200230 TMIN -17
My attempt:
dfMax = df2[df2["Element"] == "TMAX"]
dfMin = df2[df2["Element"] == "TMIN"]
dfMinD = dfMin.Data_Value.resample("d").min()
dfMaxD = dfMax.Data_Value.resample("d").max()
I'm stuck here. The plot shows years in x-axis instead of months.
I reckon I need to find the max and min temperature by month instead? How can I group them by months to show the daily max and min of each month(Jan - Dec) through out the period of 2005-2014?
First solution is change d
to MS
for start of months:
dfMax = df2[df2["Element"] == "TMAX"]
dfMin = df2[df2["Element"] == "TMIN"]
dfMinD = dfMin.Data_Value.resample("MS").min()
dfMaxD = dfMax.Data_Value.resample("MS").max()
df = pd.concat([dfMaxD, dfMinD], axis=1, keys=('max','min'))
Another solution for months periods with DatetimeIndex.to_period
pass to DataFrame.groupby
and aggregate min
and max
:
dfMax = df2[df2["Element"] == "TMAX"]
dfMin = df2[df2["Element"] == "TMIN"]
dfMaxD = dfMax.groupby(dfMax.index.to_period('m'))['Data_Value'].max()
dfMinD = dfMin.groupby(dfMin.index.to_period('m'))['Data_Value'].min()
print (dfMinD)
Date
2005-01 -44
2015-12 -17
Freq: M, Name: Data_Value, dtype: int64
print (dfMaxD)
Date
2005-01 150
2015-12 11
Freq: M, Name: Data_Value, dtype: int64
df = pd.concat([dfMaxD, dfMinD], axis=1, keys=('max','min'))
print (df)
max min
Date
2005-01 150 -44
2015-12 11 -17
Or if need one solution for both first create months PeriodIndex, then reshape by Series.unstack
and grouping by first level aggregate min
and max
:
df = (df2.set_index(df2.index.to_period('m'))
.set_index(['ID','Element'], append=True)['Data_Value']
.unstack()
.groupby(level=0)
.agg({'TMAX':'max', 'TMIN':'min'}))
print (df)
TMAX TMIN
Date
2005-01 150.0 -44.0
2015-12 11.0 -17.0