Search code examples
pythonpandasdataframemergegeopandas

Returning NAN values when merging geodataframe with dataframe


I read in the data as following

polygons = polygons.geojson
polygons = gpd.read_file(polygons)

income = income.csv
provi_income = pd.read_csv(income)

income

    Primary_IncomeNet    identificatie
1               14734    PV20      
2               16502    PV21      
3               12917    PV22      
4               31189    PV23      
5               12060    PV24      
6               59168    PV25      
7               45021    PV26      
8               94199    PV27      
9              111357    PV28      
10              10405    PV29      
11              76537    PV30      
12              29362    PV31  
polygons

  identificatie                                           geometry
0           PV20  MULTIPOLYGON (((265275.541 549247.459, 265285....
1           PV22  MULTIPOLYGON (((231437.815 516445.643, 231430....
2           PV26  MULTIPOLYGON (((131894.470 429932.357, 131917....
3           PV24  MULTIPOLYGON (((157694.139 473920.680, 159406....
4           PV23  MULTIPOLYGON (((248291.900 459808.449, 248302....
5           PV27  MULTIPOLYGON (((131700.944 464257.265, 131702....
6           PV25  MULTIPOLYGON (((181361.527 418255.386, 181384....
7           PV28  MULTIPOLYGON (((88397.000 413853.999, 89142.01...
8           PV21  MULTIPOLYGON (((189491.268 535832.617, 189494....
9           PV30  MULTIPOLYGON (((167891.450 359190.720, 168085....
10          PV31  MULTIPOLYGON (((199549.696 308385.049, 199558....
11          PV29  MULTIPOLYGON (((50235.786 357928.267, 50243.18...

And then merge based on common id. This merges the two data frames but the Primary_IncomeNet column shows NaN. Why does the Primary_IncomeNet column not merge properly and how do perform the merge correctly?

merge = polygons.merge(income, on="identificatie", how= 'left')
 identificatie                                           geometry   Primary_IncomeNet
0           PV20  MULTIPOLYGON (((265275.541 549247.459, 265285....  NaN
1           PV22  MULTIPOLYGON (((231437.815 516445.643, 231430....  NaN
2           PV26  MULTIPOLYGON (((131894.470 429932.357, 131917....  NaN
3           PV24  MULTIPOLYGON (((157694.139 473920.680, 159406....  NaN
4           PV23  MULTIPOLYGON (((248291.900 459808.449, 248302....  NaN
5           PV27  MULTIPOLYGON (((131700.944 464257.265, 131702....  NaN
6           PV25  MULTIPOLYGON (((181361.527 418255.386, 181384....  NaN
7           PV28  MULTIPOLYGON (((88397.000 413853.999, 89142.01...  NaN
8           PV21  MULTIPOLYGON (((189491.268 535832.617, 189494....  NaN
9           PV30  MULTIPOLYGON (((167891.450 359190.720, 168085....  NaN
10          PV31  MULTIPOLYGON (((199549.696 308385.049, 199558....  NaN
11          PV29  MULTIPOLYGON (((50235.786 357928.267, 50243.18...  NaN

Solution

  • Based on the alignment of the column (identificatie) of the dataframe income, I suspect that she holds trailing whitespaces since their values are aligned to the left. So, try to strip them before the merge.

    income["identificatie"] = income["identificatie"].str.strip()
    ​
    merge = polygons.merge(income, on="identificatie", how= "left")
    

    ​ Output:

    print(merge)
    
       identificatie                                           geometry  Primary_IncomeNet
    0           PV20  MULTIPOLYGON (((265275.541 549247.459, 265285....              14734
    1           PV22  MULTIPOLYGON (((231437.815 516445.643, 231430....              12917
    2           PV26  MULTIPOLYGON (((131894.470 429932.357, 131917....              45021
    3           PV24  MULTIPOLYGON (((157694.139 473920.680, 159406....              12060
    4           PV23  MULTIPOLYGON (((248291.900 459808.449, 248302....              31189
    5           PV27  MULTIPOLYGON (((131700.944 464257.265, 131702....              94199
    6           PV25  MULTIPOLYGON (((181361.527 418255.386, 181384....              59168
    7           PV28  MULTIPOLYGON (((88397.000 413853.999, 89142.01...             111357
    8           PV21  MULTIPOLYGON (((189491.268 535832.617, 189494....              16502
    9           PV30  MULTIPOLYGON (((167891.450 359190.720, 168085....              76537
    10          PV31  MULTIPOLYGON (((199549.696 308385.049, 199558....              29362
    11          PV29  MULTIPOLYGON (((50235.786 357928.267, 50243.18...              10405