Search code examples
pandaspandas-groupbypandas-apply

Pandas.Series.mode eventually have multirow results. How to fix it?


I have this df:

nome_socio   cnpj_cpf_socio   municipio
Alexandre    AAA              Curitiba
Alexandre    AAA              Rio
Alexandre    AAA              Porto Alegre
Bruno        BBB              Porto Alegre
Bruno        BBB              Porto Alegre  

I want to get the mode for rows with the same nome_socio and cnpj_cpf_socio. For that I'm using the following code:

moda_municipio=df[['nome_socio','cnpj_cpf_socio','municipio']].groupby(['nome_socio','cnpj_cpf_socio'])['municipio'].apply(pd.Series.mode).to_frame().reset_index().rename(columns={'municipio':"cidade_pred"})

It does find the mode, however since for Alexandre + AAA rows there is a draw between the three municipios it returns three different rows. I'm getting this result:

  nome_socio cnpj_cpf_socio  level_2   cidade_pred
0  Alexandre            AAA        0      Curitiba
1  Alexandre            AAA        1  Porto Alegre
2  Alexandre            AAA        2           Rio
3      Bruno            BBB        0  Porto Alegre

I need to make it look like this:

  nome_socio cnpj_cpf_socio  level_2                      cidade_pred
   Alexandre            AAA        0      Curitiba, Porto Alegre, Rio
       Bruno            BBB        0                     Porto Alegre

Is there a way to do it?


Solution

  • We should do mode first then join the result

    df.groupby(['nome_socio','cnpj_cpf_socio'])['cidade_pred'].agg(lambda x : ','.join(x.mode().tolist()))