Search code examples
pythonpython-3.xurldnsurlopen

Convert domain name to its URL format for URL parsing


In my Python script regarding URL html.text parsing, the input to my application is fixed i.e the domain name.

However I need to store and process that domain name into its URL format. I feel it is not advisable to simply prepend 'https://' to the domain name for the purpose.

As seen below, URL pasring fails because it is receives a domain format not a URL.

from urllib.request import Request, urlopen
import requests

url = 'xyz.com' # it is a domain name. But requires it to be in URL format to perform further parsing.

# Option 1
html=urlopen(url).read()

# Option 2
resp = requests.get(url)
html = resp.text

# Error encountered: Invalid URL.

What is a good way to convert a domain name to its URL format?


Solution

  • If you want to find out whether "http://"+url or "https://"+url is working, you could just check both:

    from urllib.request import urlopen
    from urllib.error import URLError
    
    url = 'yourpage.com'
    try:
      html=urlopen("https://"+url).read()
    except URLError:
      html=urlopen("http://"+url).read()