Search code examples
pythondataframepostal-code

Python - validate if an item in one dataframe column exists in the column of another dataframe


I am trying to validate a dataframe column df['Postcode'] containing Scottish postcodes. I have two CSVs containing almost all possible Scottish postcodes (small_df or large_df), and I wish to loop through these and (for now) return the postcodes in my original dataframe that do not match any of the entries in these CSVs.

The data in each dataframe (simplified below) is a UK postal code strip of spaces, e.g. PA29DE, of string type.

Case Postcode
1 PA29DE
2 PH29AD
3 nan
4 KW102ZE
5 KW123DE

I am using the following loop to do this, but it simply returns a list of all the entries in df['Postcode'].

for i in df['Postcode']:
    if i not in small_df['Postcode'] or large_df['Postcode']:
        print(i)

I was expecting only the entries in df which are not in small_df or large_df. I'm really not sure how to proceed from here, and I can't find any other solutions which work.


Solution

  • You did an error in your code, the 'or' is to test 2 conditions but large_df['Postcode'] is not a condition so you have to replace it:

    if i not in small_df['Postcode'] and i not in large_df['Postcode']:
        print(i)