Search code examples
pythonpandasdataframeaggregatemedian

Multiple operations on Dataframe


I am trying to group by one/two columns, Sum the value of forth column and find mean in fifth column. Every operation to be written in separate output. Feels a bit tricky to make even a start.

Input: No header, has more than 100k rows

StartTime, EndTime,Day,SumCount,UniqueCount
00:00:00,01:00:00,Mon,13534,594
01:00:00,02:00:00,Mon,16674,626
02:00:00,03:00:00,Mon,23736,671
03:00:00,04:00:00,Mon,16977,671
00:00:00,01:00:00,Tue,17262,747
01:00:00,02:00:00,Tue,19072,777
02:00:00,03:00:00,Tue,18275,785
03:00:00,04:00:00,Tue,13589,757
04:00:00,05:00:00,Tue,16053,735
05:00:00,06:00:00,Tue,11440,636

I am trying to find is

  1. Groupby StartTime & EndTime find Sum of SumCount and Median of UniqueCount
  2. Groupby Dayfind Sum of SumCount and Median of UniqueCount

I am lost how to put the two types of groupby in one program and get two different output.


Solution

  • Try:

    df.groupby(['StartTime', 'EndTime']).agg({'SumCount': ['sum'],
                                 'UniqueCount': {'median': lambda x: np.median(x).round(0)}})
    

    enter image description here

    Or:

    df.groupby(['Day']).agg({'SumCount': ['sum'],
                             'UniqueCount': {'median': lambda x: np.median(x).round(0)}})
    

    enter image description here