Search code examples
pandasdataframegroup-by

How to get range of values in secondary index of pandas dataframe


I have a multi-indexed pandas dataframe with two indices. The first index is 'room', the second is 'timestamp'. The columns of this table are 'total occupancy', 'temperature', 'power used' and 'event'.

The situation being tracked is consider a hotel with several ball rooms. these rooms get booked for events. Periodically, hotel mngmt records the occupancy of the rooms androom temperature.

I want to groupby 'event', and get the difference between the max & mins of 'total occupancy', 'temperature'. I also want to get the difference between max & mins for timestamps , so I can measure event length, but have been unable to.

For example, consider the following df:

#Initialize df
rm_timestamp_indices = [('A', 1300),('A', 1310),('A', 1315), 
                        ('B', 1200),('B', 1230),('B', 1350),
                        ('C', 1300),('C', 1400)]
multi_index = pd.MultiIndex.from_tuples(rm_timestamp_indices, names=['Room', 'TimeStamp'])
df = pd.DataFrame(index=multi_index)

# Put data into df
df['temp'] = [77,78,73,80,76,66,73,70]
df['pop'] = [100,110,200,300,315,290,245,250]
df['event'] = ['q','q','w','r','t','t','s','s']

Now, I can get the differences between the max and mins of the columns by

df.groupby('event').apply(lambda x:x.max()-x.min())

but have not been able to also get the difference between the max and min of the timestamps for each event.


Solution

  • You could use reset_index to bring the TimeStamp index value into the dataframe prior to grouping:

    df.reset_index('TimeStamp').groupby('event').apply(lambda x:x.max()-x.min())
    

    Output:

           TimeStamp  temp  pop
    event
    q             10     1   10
    r              0     0    0
    s            100     3    5
    t            120    10   25
    w              0     0    0