Search code examples
pythonurldnssubdomain

How to labeling domain and subdomain using python?


I'm working with url data and i have problem to categorize the url into domain and sub domain using python

I'm trying regex to extract domain but i don't know how to return it into True or False subdomain

for example

a = ['facebook.com', 'profile.facebook.com']

I expect the result is

[False, True]

Solution

  • You need to decide how loose restrictions you want to put on domain name, rest can look like:

    >>> import re
    >>> a = re.compile('[0-9a-z\.]*\.[0-9a-z]*\.com')
    >>> bool(a.match('facebook.com'))
    False
    >>> bool(a.match('sub.facebook.com'))
    True
    

    Here I assumed domain will and with .com but you can change that too easily.