In my Python script regarding URL html.text parsing, the input to my application is fixed i.e the domain name.
However I need to store and process that domain name into its URL format. I feel it is not advisable to simply prepend 'https://' to the domain name for the purpose.
As seen below, URL pasring fails because it is receives a domain format not a URL.
from urllib.request import Request, urlopen
import requests
url = 'xyz.com' # it is a domain name. But requires it to be in URL format to perform further parsing.
# Option 1
html=urlopen(url).read()
# Option 2
resp = requests.get(url)
html = resp.text
# Error encountered: Invalid URL.
What is a good way to convert a domain name to its URL format?
If you want to find out whether "http://"+url
or "https://"+url
is working, you could just check both:
from urllib.request import urlopen
from urllib.error import URLError
url = 'yourpage.com'
try:
html=urlopen("https://"+url).read()
except URLError:
html=urlopen("http://"+url).read()