I want to write webscraper to collect titles of articles from Medium.com webpage.
I am trying to write a python script that will scrape headlines from Medium.com website. I am using python 3.7 and imported urlopen
from urllib.request
.
But it cannot open the site and shows
"urllib.error.HTTPError: HTTP Error 403: Forbidden" error.
from bs4 import BeautifulSoup
from urllib.request import urlopen
webAdd = urlopen("https://medium.com/")
bsObj = BeautifulSoup(webAdd.read())
Result = urllib.error.HTTPError: HTTP Error 403: Forbidden
Expected result is that it will not show any error and just read the web site.
But this does not happen when I use requests module.
import requests
from bs4 import BeautifulSoup
url = 'https://medium.com/'
response = requests.get(url, timeout=5)
This time around it works without error.
Why ??
Urllib is pretty old and small module. For webscraping, requests
module is recommended.
You can check out this answer for additional information.