I want to write a python script which downloads the web-page only if the web-page contains HTML. I know that content-type
in header
will be used. Please suggest someway to do it as i am unable to get a way to get header
before the file download.
Use http.client
to send a HEAD
request to the URL. This will return only the headers for the resource then you can look at the content-type
header and see if it text/html
. If it is then send a GET
request to the URL to get the body.