I get the following error when I try to call BeautifulSoup(page)
Traceback (most recent call last):
File "error.py", line 10, in <module>
soup = BeautifulSoup(page)
File "C:\Python33\lib\site-packages\bs4\__init__.py", line 169, in __init__
self.builder.prepare_markup(markup, from_encoding))
File "C:\Python33\lib\site-packages\bs4\builder\_htmlparser.py", line 136, in
prepare_markup
dammit = UnicodeDammit(markup, try_encodings, is_html=True)
File "C:\Python33\lib\site-packages\bs4\dammit.py", line 223, in __init__
u = self._convert_from(chardet_dammit(self.markup))
File "C:\Python33\lib\site-packages\bs4\dammit.py", line 30, in chardet_dammit
return chardet.detect(s)['encoding']
File "C:\Python33\lib\site-packages\chardet\__init__.py", line 21, in detect
import universaldetector
ImportError: No module named 'universaldetector'
I am running Python 3.3 in windows 7, I have installed bs4 from the setup.py by downloading the .tar.gz. I have installed pip and then installed chardet by doing pip.exe install chardet. My chardet version is 2.2.1. Bs4 works fine for other url.
Here's the code
import sys
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
import chardet
url = "http://www.edgar-online.com/brand/yahoo/search/?cik=1400810"
page = urlopen(url).read()
#print(page)
soup = BeautifulSoup(page)
I look forward to your answers
I meet this situation just now.
Do not import chardet,and I also uninstall chardet.
Then build would pass.
below code is a part of dammit.py
lib in beautifulsoup.
Maybe you import a chardet not fits python 3.3, so the error occurs.
try:
# First try the fast C implementation.
# PyPI package: cchardet
import cchardet
def chardet_dammit(s):
return cchardet.detect(s)['encoding']
except ImportError:
try:
# Fall back to the pure Python implementation
# Debian package: python-chardet
# PyPI package: chardet
import chardet
def chardet_dammit(s):
return chardet.detect(s)['encoding']
#import chardet.constants
#chardet.constants._debug = 1
except ImportError:
# No chardet available.
def chardet_dammit(s):
return None