Search code examples
pythonpandasdataframemode

Calculate mode on a dataframe without sorting the result


I have a data frame like this:

df = pd.DataFrame({'a1': [2,3,4,8,8], 'a2': [2,5,7,5,10], 'a3':[1,9,4,10,2]})

    a1  a2  a3
0   2   2   1
1   3   5   9
2   4   7   4
3   8   5   10
4   8   10  2

The output should be:

0  2 
1  3
2  4
3  8 
4  8

What to do: I want to calculate mode row-wise, and if the mode is not present, I want the value from a1 (first column).

For example: In second row (3,5,9), the mode is not present so I get 3 in output.

Note: I've already tried df.mode(axis=1) but that seems to shuffle the sequence of values row wise, so I don't always get the value of first column in the output.


Solution

  • No-Sort Methods

    agg + collections.Counter. Does not sort the modes.

    from collections import Counter
    df.agg(lambda x: Counter(x).most_common(1)[0][0], axis=1)
    
    0    2
    1    3
    2    4
    3    8
    4    8
    dtype: int64
    

    Mode Sorting Methods

    1. Use mode along the first axis and then take whatever comes first:

      df.mode(axis=1).iloc[:, 0]
      

      Or,

      df.mode(axis=1)[0] 
      

      0    2.0
      1    3.0
      2    4.0
      3    5.0
      4    2.0
      Name: 0, dtype: float64
      
    2. scipy.stats.mode

      from scipy.stats import mode
      np.array(mode(df, axis=1))[0].squeeze()
      array([2, 3, 4, 5, 2])