I have a data frame like this:
df = pd.DataFrame({'a1': [2,3,4,8,8], 'a2': [2,5,7,5,10], 'a3':[1,9,4,10,2]})
a1 a2 a3
0 2 2 1
1 3 5 9
2 4 7 4
3 8 5 10
4 8 10 2
The output should be:
0 2
1 3
2 4
3 8
4 8
What to do: I want to calculate mode row-wise, and if the mode is not present, I want the value from a1 (first column).
For example: In second row (3,5,9)
, the mode is not present so I get 3
in output.
df.mode(axis=1)
but that seems to shuffle the sequence of values row wise, so I don't always get the value of first column in the output.No-Sort Methods
agg
+ collections.Counter
. Does not sort the modes.
from collections import Counter
df.agg(lambda x: Counter(x).most_common(1)[0][0], axis=1)
0 2
1 3
2 4
3 8
4 8
dtype: int64
Mode Sorting Methods
Use mode
along the first axis and then take whatever comes first:
df.mode(axis=1).iloc[:, 0]
Or,
df.mode(axis=1)[0]
0 2.0
1 3.0
2 4.0
3 5.0
4 2.0
Name: 0, dtype: float64
scipy.stats.mode
from scipy.stats import mode
np.array(mode(df, axis=1))[0].squeeze()
array([2, 3, 4, 5, 2])