I am trying to update web scraper app that uses Beautiful Soup 4 in Python 3 in Anaconda to use the Requests package instead of urllib, urllib2 and urllib3.
urllib and urllib2 don't exist in the Anaconda channels and from what I have read requests package has made urllib and urllib2 obsolete. I am still rather new in Python programming for web scraping, and don't yet fully understand all concepts and internal subtleties of these 4 packages.
When I replace "urllib2.urlopen()" with "requests.get()", I get the following error:
import requests from bs4 import BeautifulSoup
'''replace the following line with "page = Request.get(url)" '''
# page = urllib2.urlopen(url)
page = requests.get(url)
soup_page = BeautifulSoup(page,"lxml")
I get the following error message with no explanation in the bs4 module: File "C:\ProgramData\Anaconda3\lib\site-packages\bs4__init__.py", line 246, in init elif len(markup) <= 256 and (
TypeError: object of type 'Response' has no len()
This error message puts me deep into the bowels of init.py in bs4.
I cannot find an explanation of how to port urllib or urllib2 code to requests with Beautiful Soup 4.
Can anyone provide an explicit guide on how to port urllib / urllib2 apps to use requests with beautiful soup in Python 3?
Anaconda / conda does not import urllib or urllib2 into Python 3 environments.
Thank you.
Rich
The error occurs because you're trying to pass the html code of the response to Beautifulsoup in the wrong way. Pass response.text
, instead of the response object:
# page = urllib2.urlopen(url)
page = requests.get(url)
soup_page = BeautifulSoup(page.text, "lxml")
You may need to read requests documentation