Search code examples
pythonurllib

urlopen of urllib.request cannot open a page in python 3.7


I want to write webscraper to collect titles of articles from Medium.com webpage.

I am trying to write a python script that will scrape headlines from Medium.com website. I am using python 3.7 and imported urlopen from urllib.request. But it cannot open the site and shows

 "urllib.error.HTTPError: HTTP Error 403: Forbidden" error. 
from bs4 import BeautifulSoup
from urllib.request import  urlopen

webAdd = urlopen("https://medium.com/")
bsObj = BeautifulSoup(webAdd.read())
Result = urllib.error.HTTPError: HTTP Error 403: Forbidden

Expected result is that it will not show any error and just read the web site.

But this does not happen when I use requests module.

import requests 
from bs4 import BeautifulSoup 
url = 'https://medium.com/' 
response = requests.get(url, timeout=5)

This time around it works without error.

Why ??


Solution

  • Urllib is pretty old and small module. For webscraping, requests module is recommended. You can check out this answer for additional information.