how to port python urllib2 app (a web scraper) that uses Beautiful Soup 4 to use requests package instead

I am trying to update web scraper app that uses Beautiful Soup 4 in Python 3 in Anaconda to use the Requests package instead of urllib, urllib2 and urllib3.

urllib and urllib2 don't exist in the Anaconda channels and from what I have read requests package has made urllib and urllib2 obsolete. I am still rather new in Python programming for web scraping, and don't yet fully understand all concepts and internal subtleties of these 4 packages.

When I replace "urllib2.urlopen()" with "requests.get()", I get the following error:

import requests from bs4 import BeautifulSoup

'''replace the following line with "page =  Request.get(url)" '''
#   page = urllib2.urlopen(url)
page = requests.get(url)
soup_page = BeautifulSoup(page,"lxml")

I get the following error message with no explanation in the bs4 module: File "C:\ProgramData\Anaconda3\lib\site-packages\bs4__init__.py", line 246, in init elif len(markup) <= 256 and (

TypeError: object of type 'Response' has no len()

This error message puts me deep into the bowels of init.py in bs4.

I cannot find an explanation of how to port urllib or urllib2 code to requests with Beautiful Soup 4.

Can anyone provide an explicit guide on how to port urllib / urllib2 apps to use requests with beautiful soup in Python 3?

Anaconda / conda does not import urllib or urllib2 into Python 3 environments.

Thank you.

Rich

Solution

The error occurs because you're trying to pass the html code of the response to Beautifulsoup in the wrong way. Pass response.text, instead of the response object:

# page = urllib2.urlopen(url)

page = requests.get(url)

soup_page = BeautifulSoup(page.text, "lxml")

You may need to read requests documentation