Search code examples
python-3.xpandasmax

Dataset with maximal rows by userId indicated


I have a dataframe like this:

ID date        var1 var2 var3
AB 22/03/2020  0     1   3 
AB 29/03/2020  0     3   3 
CD 22/03/2020  0     1   1

And I would like to have a new dataset that, if it is a maximal column (can happen ties too) leaves the same number of the original dataset on the rows; otherwise set -1 if it is not the maximal. So it would be:

ID date        var1   var2    var3
AB 22/03/2020  -1     -1        3 
AB 29/03/2020  -1      3        3 
CD 22/03/2020  -1      1        1

But I am not managing at all how to do this. What can I try next?


Solution

  • Select only numeric columns by DataFrame.select_dtypes:

    df1 = df.select_dtypes(np.number)
    

    Or select all columns without first two by positions by DataFrame.iloc:

    df1 = df.iloc[:, 2:]
    

    Or select columns with var label by DataFrame.filter:

    df1 = df1.filter(like='var')
    

    And then set new values by DataFrame.where with max:

    df[df1.columns] = df1.where(df1.eq(df1.max(1), axis=0), -1)
    print (df)
       ID        date  var1  var2  var3
    0  AB  22/03/2020    -1    -1     3
    1  AB  29/03/2020    -1     3     3
    2  CD  22/03/2020    -1     1     1