Search code examples
pythonpandasmatplotlibplotpython-datetime

Groupby max and min value of each month over the years


I'm trying to plot a chart that shows daily Max and Min temperature from the period of 2005-2014. The temperature is denoted in the Data_Value column.

I have a hard time trying to figure out how to proceed here. I need to show the x axis in months instead of years or days.

Dataset

                     ID Element  Data_Value
Date                                       
2005-01-01  USW00004848    TMIN           0
2005-01-01  USC00207320    TMAX         150
2005-01-01  USC00207320    TMIN         -11
2005-01-01  USW00014833    TMIN         -44
2005-01-01  USW00014833    TMAX          33
.....                                     
2015-12-31  USC00200032    TMAX          11
2015-12-31  USC00205050    TMIN         -17
2015-12-31  USC00208202    TMAX           0
2015-12-31  USC00201250    TMIN          -6
2015-12-31  USC00200230    TMIN         -17

My attempt:

  1. Extract the TMAX and TMIN from the dataset respectively
dfMax = df2[df2["Element"] == "TMAX"] 
dfMin = df2[df2["Element"] == "TMIN"]
  1. Find the daily Max and Min
dfMinD = dfMin.Data_Value.resample("d").min()
dfMaxD = dfMax.Data_Value.resample("d").max()

I'm stuck here. The plot shows years in x-axis instead of months.

I reckon I need to find the max and min temperature by month instead? How can I group them by months to show the daily max and min of each month(Jan - Dec) through out the period of 2005-2014?


Solution

  • First solution is change d to MS for start of months:

    dfMax = df2[df2["Element"] == "TMAX"] 
    dfMin = df2[df2["Element"] == "TMIN"]
    
    dfMinD = dfMin.Data_Value.resample("MS").min()
    dfMaxD = dfMax.Data_Value.resample("MS").max()
    
    df = pd.concat([dfMaxD, dfMinD], axis=1, keys=('max','min'))
    

    Another solution for months periods with DatetimeIndex.to_period pass to DataFrame.groupby and aggregate min and max:

    dfMax = df2[df2["Element"] == "TMAX"] 
    dfMin = df2[df2["Element"] == "TMIN"]
    
    dfMaxD = dfMax.groupby(dfMax.index.to_period('m'))['Data_Value'].max()
    dfMinD = dfMin.groupby(dfMin.index.to_period('m'))['Data_Value'].min()
    
    print (dfMinD)
    Date
    2005-01   -44
    2015-12   -17
    Freq: M, Name: Data_Value, dtype: int64
    
    print (dfMaxD)
    Date
    2005-01    150
    2015-12     11
    Freq: M, Name: Data_Value, dtype: int64
    
    df = pd.concat([dfMaxD, dfMinD], axis=1, keys=('max','min'))
    print (df)
             max  min
    Date             
    2005-01  150  -44
    2015-12   11  -17
    

    Or if need one solution for both first create months PeriodIndex, then reshape by Series.unstack and grouping by first level aggregate min and max:

    df = (df2.set_index(df2.index.to_period('m'))
             .set_index(['ID','Element'], append=True)['Data_Value']
             .unstack()
             .groupby(level=0)
             .agg({'TMAX':'max', 'TMIN':'min'}))
    print (df)
              TMAX  TMIN
    Date                
    2005-01  150.0 -44.0
    2015-12   11.0 -17.0