I have this df:
nome_socio cnpj_cpf_socio municipio
Alexandre AAA Curitiba
Alexandre AAA Rio
Alexandre AAA Porto Alegre
Bruno BBB Porto Alegre
Bruno BBB Porto Alegre
I want to get the mode for rows with the same nome_socio
and cnpj_cpf_socio
. For that I'm using the following code:
moda_municipio=df[['nome_socio','cnpj_cpf_socio','municipio']].groupby(['nome_socio','cnpj_cpf_socio'])['municipio'].apply(pd.Series.mode).to_frame().reset_index().rename(columns={'municipio':"cidade_pred"})
It does find the mode, however since for Alexandre
+ AAA
rows there is a draw between the three municipios
it returns three different rows. I'm getting this result:
nome_socio cnpj_cpf_socio level_2 cidade_pred
0 Alexandre AAA 0 Curitiba
1 Alexandre AAA 1 Porto Alegre
2 Alexandre AAA 2 Rio
3 Bruno BBB 0 Porto Alegre
I need to make it look like this:
nome_socio cnpj_cpf_socio level_2 cidade_pred
Alexandre AAA 0 Curitiba, Porto Alegre, Rio
Bruno BBB 0 Porto Alegre
Is there a way to do it?
We should do mode
first then join
the result
df.groupby(['nome_socio','cnpj_cpf_socio'])['cidade_pred'].agg(lambda x : ','.join(x.mode().tolist()))