Fetch a particular part of the url in python

I am using python and trying to fetch a particular part of the url as below

from urlparse import urlparse as ue

url = "https://www.google.co.in"
img_url = ue(url).hostname

Result

www.google.co.in

case1:

Actually i will have a number of urls(stored in a list or some where else), so what i want is, need to find the domain name as above in the url and fetch the part after www. and before .co.in, that is the string starts after first dot and before second dot which results only google in the present scenario.

So suppose the url given is url given is www.gmail.com, i should fetch only gmail in that, so what ever the url given, the code should fetch the part thats starts with first dot and before second dot.

case2:

Also some urls may be given directly like this domain.com, stackoverflow.com without www in the url, in that cases it should fetch only stackoverflow and domain.

Finally my intention is to fetch the main name from the url that gmail, stackoverflow, google like so.....

Generally if i have one url i can use list slicing and will fetch the string, but i will have a number of ulrs, so need to fetch the wanted part like mentioned above dynamically

Can anyone please let me know how to satisfy the above concept ?

Solution

Here is my solution, at the end, domains holds a list of domains you expected.

import urlparse
urls = [
    'https://www.google.com', 
    'http://stackoverflow.com',
    'http://www.google.co.in',
    'http://domain.com',
    ]
hostnames = [urlparse.urlparse(url).hostname for url in urls]
hostparts = [hostname.split('.') for hostname in hostnames]
domains = [p[0] == 'www' and p[1] or p[0] for p in hostparts]
print domains # ==> ['google', 'stackoverflow', 'google', 'domain']

Discussion

First, we extract the host names from the list of URLs using urlparse.urlparse(). The hostnames list looks like this:

[ 'www.google.com', 'stackoverflow.com, ... ]
In the next line, we break each host into parts, using the dot as the separator. Each item in the hostparts looks like this:

[ ['www', 'google', 'com'], ['stackoverflow', 'com'], ... ]
The interesting work is in the next line. This line says, "if the first part before the dot is www, then the domain is the second part (p[1]). Otherwise, the domain is the first part (p[0]). The domains list looks like this:

[ 'google', 'stackoverflow', 'google', 'domain' ]
My code does not know how to handle login.gmail.com.hk. I hope someone else can solve this problem as I am late for bed. Update: Take a look at the tldextract by John Kurkowski, which should do what you want.