Ok how do i use regex to remove http AND/OR www just to get http://www.domain.com/ into domain.com
Assume x as any kind of TLD or cTLD
Input example:
www.domain.x
Output:
domain.x
If you really want to use regular expressions instead of urlparse()
or splitting the string:
>>> domain = 'http://www.example.com/'
>>> re.match(r'(?:\w*://)?(?:.*\.)?([a-zA-Z-1-9]*\.[a-zA-Z]{1,}).*', domain).groups()[0]
example.com
The regular expression might a bit simplistic, but works. It's also not replacing, but I think getting the domain out is easier.
To support domains like 'co.uk', one can do the following:
>>> p = re.compile(r'(?:\w*://)?(?:.*?\.)?(?:([a-zA-Z-1-9]*)\.)?([a-zA-Z-1-9]*\.[a-zA-Z]{1,}).*')
>>> p.match(domain).groups()
('google', 'co.uk')
So you got to check the result for domains like 'co.uk', and join the result again in such a case. Normal domains should work OK. I could not make it work when you have multiple subdomains.
One-liner without regular expressions or fancy modules:
>>> domain = 'http://www.example.com/'
>>> '.'.join(domain.replace('http://','').split('/')[0].split('.')[-2:])