Search code examples
pythonpandasdataframedata-cleaningzipcode

Extracting zip code from a string with full address


I have scraped some websites to gather company data. The address data is one of them. Due to the HTML tag I was only able to scrape the data within one 'tag'. An example is of the output of my data can be seen below.

Streetname housenumber zip-code city country
Street 1 1234 AB Amsterdam Netherlands
Longerstreetname 22 9876 XY Den Haag Netherlands
Name: Address, Length: 314, dtype: object

Now, I need to extract the ZIP code (only the zip code) into a new column for further analysis. I am mostly using pandas within my data cleaning phase. (I need to find out in what province every company is located)

I have searched for numerous options to find a method to extract the zip code, hence I did not succeed. Any help would be very much appreciated!

enter image description here


Solution

  • I think you can use regex.

    Example:

    import re
    
    
    address = '7802 Grant Avenue Egg Harbor Township, NJ 08234'
    us_zip = r'(\d{5}\-?\d{0,4})'
    zip_code = re.search(us_zip, address)
    zip_code.group(1)
    
    

    Important note: There is no specific pattern for zip code around the world. If you want to scrape companies from different countries, you should implement regex for all of them.

    Hope this file could help you. zip codes regex