Search code examples
pythonurllib

followed urllib.request.urlopen documentation but still not working


I followed the documentation but still getting errors which I can't figure out. I am using Python 3.

Here is my code:

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen('http://pythonscraping.com/pages/page1.html')

bs = BeautifulSoup(html.read(), "html.parser")
print(bs.h1)

Code Editor with code and errors


Solution

  • You did everything right.

    The URL you supplied uses HTTPS and the error you get is related to the certificate problems on the website.

    If you are trying to learn new things, just change the URL to some other example website.

    If you want to get the result from a specific URL no matter the cost, add keyword argument context to your urlopen call and provide a correct SSL context for it to work:

    from ssl import create_default_context, CERT_NONE
    from urllib.request import urlopen
    from bs4 import BeautifulSoup
    context = create_default_context()
    context.verify_mode = ssl.CERT_NONE
    html = urlopen('http://pythonscraping.com/pages/page1.html', context=context)
    bs = BeautifulSoup(html.read(), "html.parser")
    print(bs.h1)