Search code examples
pandasdataframepandasql

Ignore minimum values from distincts rows in pandas


I've below pandas dataframe

Name    |  Value1 |  Value2
User 1  |  1      |  15
User 1  |  2      |  35
User 1  |  3      |  35
User 1  |  3      |  35
User 2  |  4      |  23
User 2  |  5      |  25
User 3  |  6      |  45
User 3  |  7      |  50

I want to select only those rows which have largest value in Value2 column using pandas dataframe

Like this:

Name    |  Value1 |  Value2
User 1  |  2      |  35
User 1  |  3      |  35
User 1  |  3      |  35
User 2  |  5      |  25
User 3  |  7      |  50

Code with description will help me more as pandas are new for me.

Thanks in Advance :)


Solution

  • Use GroupBy.transform with max for Series filled by max values per groups, compare by column Value2 with Series.eq and filter by boolean indexing:

    df1 = df[df.groupby('Name')['Value2'].transform('max').eq(df['Value2'])]
    print (df1)
    
         Name  Value1  Value2
    1  User 1       2      35
    2  User 1       3      35
    3  User 1       3      35
    5  User 2       5      25
    7  User 3       7      50
    

    Details:

    print (df.groupby('Name')['Value2'].transform('max'))
    0    35
    1    35
    2    35
    3    35
    4    25
    5    25
    6    50
    7    50
    Name: Value2, dtype: int64
    
    print (df.groupby('Name')['Value2'].transform('max').eq(df['Value2']))
    0    False
    1     True
    2     True
    3     True
    4    False
    5     True
    6    False
    7     True
    Name: Value2, dtype: bool