Search code examples
pythonsqldataframegroup

how to get max of string in a group by in dataframe


Table :-

I/P

user_id | channel 

123.    | online
123.    | Null
123     | 

Group by user_id and max of channel

O/P

user_id | channel 

123.    | online

in SQL it is done using this query

select user_id,max(channel) from table_name group by user_id

the same functionality I need in pandas Dataframe


Solution

  • Gedas Miksenas is right... But in your data there is blank & null too, so might through error, it's better to first convert the channel column to str then do groupby... like this...

    df = pd.DataFrame({"user_id":[123,123,123],
                       "channel":["online",np.nan,""]})
    df['channel'] = df['channel'].astype(str)
    df1 = df.groupby('user_id')['channel'].max().reset_index()
    
    # Output
       user_id channel
    0      123  online