I am trying to extract the words(verbs) starting with R from this page. But on executing the following code:
from bs4 import BeautifulSoup
import urllib2
url = "http://www.usingenglish.com/reference/phrasal-verbs/r.html"
content = urllib2.urlopen(url).read()
soup = BeautifulSoup(content)
print soup.prettify()
The Error thrown was something like this:
UnicodeEncodeError: 'charmap' codec can't encode character u '\xa9' in position 57801: character maps to undefined
Can someone please tell me what the error is and how to fix and proceed?
It would be much easier if you showed us the whole stack trace or, at least, at which line it points.
Anyway, I bet, the problem is with the last line. Change it to:
print(soup.prettify().encode('utf-8'))