I'm writing function for send request and get response of websites and parse of content of it... but when i send request to persian sites it cant decode content of it
def gather_links(page_url):
html_string = ''
try:
response = urlopen(page_url)
if 'text/html' in response.getheader('Content-Type'):
html_bytes = response.read()
html_string = html_bytes.decode("utf-8")
except Exception as e:
print(str(e))
show this ERROR for example https://www.entekhab.ir/ :
'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
how can i change the code for decode this kind of sites too?
You should use requests instead of urllib.
import requests
response = requests.get('https://www.entekhab.ir/')
print(response.text)