Search code examples
pythondataframemode

How to groupby().transform() to find mode in dataframe?


I have a dataframe like:

lst = [["High", "A"], ["High", "A"], ["High", "B"],["Medium", "A"], ["Medium", "B"], ["Medium", "C"]]

df = pd.DataFrame(lst, columns =["Class", "Grade"])

I need to get the mode (majority vote) of "Grade" in each "Class". If it's a tie vote, assign "x".

Below is what I expect to get:

Class Grade Majority_vote
High A A
High A A
High B A
Medium A x
Medium B x
Medium C x

This is my code:

df['majority_vote'] = df.groupby(['Class'])['Grade'].transform(lambda x: x.mode()[0])

I think the code will return 'nan' if it's a tie vote. Then, I will change 'nan' to 'x' later.

However, what I get is below:

Class Grade Majority_vote
High A A
High A A
High B A
Medium A A
Medium B A
Medium C A

At class "Medium", the code returns the 1st element ("A") instead of 'nan'.

Any other method is appreciated. Could you please help me? Thank you in advance.


Solution

  • The issue with using x.mode()[0] is that pd.Series(['A', 'B', 'C']).mode() evaluates to ['A', 'B', 'C']. Meanwhile, pd.Series(['A', 'A', 'B']).mode() evaluates to ['A'].

    Here is a function that will return the mode (if there is only one) and "x" if there is a tie (i.e., multiple modes).

    import pandas as pd
    lst = [["High", "A"], ["High", "A"], ["High", "B"],["Medium", "A"], ["Medium", "B"], ["Medium", "C"]]
    df = pd.DataFrame(lst, columns=["Class", "Grade"])
    
    def get_mode_or_x(series):
        mode = series.mode()
        if mode.size == 1:
            return mode[0]
        return "x"
    
    df.loc[:, "majority_vote"] = df.groupby("Class")["Grade"].transform(get_mode_or_x)
    
    index Class Grade majority_vote
    0 High A A
    1 High A A
    2 High B A
    3 Medium A x
    4 Medium B x
    5 Medium C x