Search code examples

String mode aggregation with group by function

I have dataframe which looks like below

Country  City
UK       London
USA      Washington
UK       London
UK       Manchester
USA      Washington
USA      Chicago

I want to group country and aggregate on the most repeated city in a country

My desired output should be like

Country City
UK      London
USA     Washington

Because London and Washington appears 2 times whereas Manchester and Chicago appears only 1 time.

I tried

from scipy.stats import mode
df_summary = df.groupby('Country')['City'].\
                        apply(lambda x: mode(x)[0][0]).reset_index()

But it seems it won't work on strings


  • I can't replicate your error, but you can use pd.Series.mode, which accepts strings and returns a series, using iat to extract the first value:

    res = df.groupby('Country')['City'].apply(lambda x: x.mode().iat[0]).reset_index()
      Country        City
    0      UK      London
    1     USA  Washington