Search code examples
pythonpandasdataframesubstring

Get country name from dataframe column by comparing with a list


How to get country name from dataframe column by comparing with a list of string which contain country names?

For example:

list = ["pakistan","united kingdom","uk","usa","united states","uae"]

# create dataframe column name is job_location of employee
df = pd.DataFrame({
        'job_location' : ['birmingham, england, united kingdom','new jersey, united states','gilgit-baltistan, pakistan','uae','united states','pakistan','31-c2, gulberg 3, lahore, pakistan'],
    })
df 
job_location
0   birmingham, england, united kingdom
1   new jersey, united states
2   gilgit-baltistan, pakistan
3   uae
4   united states
5   pakistan
6   31-c2, gulberg 3, lahore, pakistan

I need a new column in dataframe name as country which contain country name from job_location column.


Solution

  • Not assuming that the country will always be at the end, here is something that should work:

    import pandas as pd
    
    country_list = ["pakistan","united kingdom","uk","usa","united states","uae"]
    
    # create dataframe column name is job_location of employee
    df = pd.DataFrame({
            'job_location' : ['birmingham, england, united kingdom','new jersey, united states','gilgit-baltistan, pakistan','uae','united states','pakistan','31-c2, gulberg 3, lahore, pakistan'],
        })
    
    matching_countries = []
    
    for key, value in df.items():
        for text in value:
            for country in country_list:
                    if country in text:
                        matching_countries.append(country)
    
    df['country'] = matching_countries
    
    print (df)
    

    Outputs:

                              job_location         country
    0  birmingham, england, united kingdom  united kingdom
    1            new jersey, united states   united states
    2           gilgit-baltistan, pakistan        pakistan
    3                                  uae             uae
    4                        united states   united states
    5                             pakistan        pakistan
    6   31-c2, gulberg 3, lahore, pakistan        pakistan