Search code examples
pythonpandasdataframemedian

Getting median based on names and time


Assume I have a dataframe:

 ID              Timestamp   Temp     Height
 01    2016-01-01 09:00:00     17         10
 01    2016-01-01 10:15:00     20          5
 02    2019-05-23 07:30:00     12          3
 01    2012-01-01 09:00:00     15         12
 01    2018-01-01 09:00:00     13         18
 02    2013-11-23 17:40:00     22          9

What I want is to get the median value of column 'Height' based on 'ID', & month & hour from 'Timestamp'. An example of my desired output:

 ID    Month     Hour     Height_median
 01       01       09                12
 01       01       10                 5
 02       05       07                 3
 02       11       17                 9

Solution

  • Here you go:

    import pandas as pd 
    
    df = pd.DataFrame(dict(ID=[1,1,2,1,1,2], 
                      Timestamp=['2016-01-01 09:00:00','2016-01-01 10:15:00','2019-05-23 07:30:00',
                                '2012-01-01 09:00:00','2018-01-01 09:00:00','2013-11-23 17:40:00'],
                      Height = [10,5,3,12,18,9]))
    
    df.Timestamp = pd.to_datetime(df.Timestamp)
    df['month'] = df.Timestamp.apply(lambda x: x.month)
    df['hour'] = df.Timestamp.apply(lambda x: x.hour)
    df.groupby(['ID','month','hour'])[['Height']].agg('median').reset_index()