I followed the documentation but still getting errors which I can't figure out. I am using Python 3.
Here is my code:
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('http://pythonscraping.com/pages/page1.html')
bs = BeautifulSoup(html.read(), "html.parser")
print(bs.h1)
You did everything right.
The URL you supplied uses HTTPS
and the error you get is related to the certificate problems on the website.
If you are trying to learn new things, just change the URL to some other example website.
If you want to get the result from a specific URL no matter the cost, add keyword argument context
to your urlopen
call and provide a correct SSL context for it to work:
from ssl import create_default_context, CERT_NONE
from urllib.request import urlopen
from bs4 import BeautifulSoup
context = create_default_context()
context.verify_mode = ssl.CERT_NONE
html = urlopen('http://pythonscraping.com/pages/page1.html', context=context)
bs = BeautifulSoup(html.read(), "html.parser")
print(bs.h1)