Search code examples
pythoncomparisonstring-comparison

How to compare two lists and determine if they have common string elements?


I have lists of domains:

domains_1 = ['google.com', 'payments-amazon.com']
domains_2 = ['https://static-eu.payments-amazon.com/OffAmazonPayments/de/lpa/js/Widgets.js']

In this case, payments-amazon.com is the common domain. How would I go about finding this, given that domain names can be long and unique?

I have tried this, but this only works if the domains are exact. I need them to match if they include part of the domain in the list/string:

matches = (set(domains_1).intersection(domains_2))
print(matches)

Solution

  • You can use a package like tldextract - which works great except in a AWS lambda setup. Or you can use something like this to get the domain from your URL.

    def extract_domain(url):
        from urllib.parse import urlparse
        parsed_domain = urlparse(url)
        domain = parsed_domain.netloc or parsed_domain.path # Just in case, for urls without scheme
        domain_parts = domain.split('.')
        if len(domain_parts) > 2:
            return '.'.join(domain_parts[-(2 if domain_parts[-1] in {
                'com', 'net', 'org', 'io', 'ly', 'me', 'sh', 'fm', 'us'} else 3):])
        return domain
    
    for x in domains_2:
        dom = extract_domain(x)
        if dom in domains_1:
            do your thing