Search code examples
pythonpandasdataframeloopspython-itertools

compare 2 dataframes simultaneously - 2 itertuples?


Im comparing 2 dataframes and Id like see if the the name matches on the address then to pull the unique ID. otherwise, continue on and search for the best match. (Im using fuzzy matcher for that part)

I was exploring itertools and wondered if using the itertools.zip_longest option would work simultaneously to compare 2 items togther. rather than using 2 for loops (example for x in df1.itertuples: do something... for y in df2.itertuples: do something) would something like this work?

result = itertools.zip_longest(df1.itertuples(), df2.itertuples())

Here's my 2 dataframes -

Here's my DF1:

         NAME          ADDRESS       CUSTOMER_SUPPLIER_NUMBER      Sales  Calls  Target
0        OFFICE 1        123 road               2222277            84    170     265
1        OFFICE 2        15  lane               2222289            7     167     288
2        OFFICE 3        3 highway              1111111            1       2     286
3        OFFICE 4        2 street               1111171            95    193    299
4        OFFICE 5        1 place                1111191            9     193    298
DF2:
             NAME       ADDRESS        CUSTOMER_SUPPLIER_NUMBER          UNIQUE ID     
0        OFFICE 1        123 road             2222277                       014168     
1        OFFICE 2        15 lane              2222289                       131989          
2        OFFICE 3        3 highway            1111111                       149863            
3        OFFICE 4        2 street             1111171                       198664        
4        OFFICE 5        1 place             1111191                        198499
5        OFFICE 6        zzzz rd              165198                        198791
6        OFFICE 7        5z st                 19844                        298791      
7        OFFICE 8        34 hwy               981818                        398791
8        OFFICE 9        81290 rd             899811                        498791
9        OFFICE 10        59 rd               699161                        598791
10      OFFICE  11        5141 bldvd          33211                         698791

Then perform a for loop and do some comparison if statements. I can access both items side by side but how would I then loop over the items to do the check?

Right now im getting: " TypeError: 'NoneType' object is not subscriptable"

for yy in  result:
     if yy[0][1]== yy[1][1]:
         print(yy) ......

Solution

  • If your headers are the same in both df´s, just apply merge:

    dfmerge=pd.merge(df1,df2)
    

    the output should be:

    dfmerge