Search code examples
pythonpandasdataframemergeetl

requirements for merging two Pandas DataFrame


I do not know why I cannot merge two Dataframes based on their Alpha-3 code and iso_code as follow although the '"' were removed before merging:

import pandas as pd

vac = pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv')
world = pd.read_csv('https://gist.githubusercontent.com/tadast/8827699/raw/f5cac3d42d16b78348610fc4ec301e9234f82821/countries_codes_and_coordinates.csv').

The outcome when I merged was just column names.

Could anyone explain, please?


Solution

  • Taking a look inside the data, you will notice that world df will contain quotes inside the Alpha-3 code:

              Country Alpha-2 code Alpha-3 code Numeric code Latitude (average)  \
    0     Afghanistan         "AF"        "AFG"          "4"               "33"   
    1         Albania         "AL"        "ALB"          "8"               "41"   
    2         Algeria         "DZ"        "DZA"         "12"               "28"   
    3  American Samoa         "AS"        "ASM"         "16"         "-14.3333"   
    4         Andorra         "AD"        "AND"         "20"             "42.5"   
    
      Longitude (average)  
    0                "65"  
    1                "20"  
    2                 "3"  
    3              "-170"  
    4               "1.6"  
    

    So you will want to get rid of that with:

    # note the space is intentional, to get rid of blank spaces
    world['Alpha-3 code'] = world['Alpha-3 code'].str.strip(' "')