Search code examples
pandasdataframecountpandas-groupbycounting

How to separate entries, and count the occurrences


I'm trying to count which country most celebrities come from. However the csv that I'm working with has multiple countries for a single celeb. e.g. "France, US" for someone with a double nationality.

To count the above, I can use .count() for the entries in the "nationality" column. But, I want to count France, US and any other country separately.

I cannot figure out a way to separate all the entries in column and then, count the occurrences.

I want to be able to reorder my dataframe with these counts, so I want to count this inside the structure

data.groupby(by="nationality").count()

This returns some faulty counts of

"France, US" 1


Solution

  • Assuming this type of data:

    data = pd.DataFrame({'nationality': ['France','France, US', 'US', 'France']})
    
      nationality
    0      France
    1  France, US
    2          US
    3      France
    

    You need to split and explode, then use value_counts to get the sorted counts per country:

    out = (data['nationality']
           .str.split(', ')
           .explode()
           .value_counts()
          )
    

    Output:

    France    3
    US        2
    Name: nationality, dtype: int64