I am using chardet 2.01 in python 3.2,the souce code like this site http://getpython3.com/diveintopython3/case-study-porting-chardet-to-python-3.html
can download here
http://jaist.dl.sourceforge.net/project/cygwin-ports/release-2/Python/python3-chardet/python3-chardet-2.0.1-2.tar.bz2
I use lxml2 to parse html to get some string
,and use below code to detect the encoding
chardet.detect(name)
But an error occurs
Traceback (most recent call last):
File "C:\python\test.py", line 125, in <module>
print(chardet.detect(str(name)))
File "E:\Python32\lib\site-packages\chardet\__init__.py", line 24, in detect
u.feed(aBuf)
File "E:\Python32\lib\site-packages\chardet\universaldetector.py", line 98, in feed
if self._highBitDetector.search(aBuf):
TypeError: can't use a bytes pattern on a string-like object
name
is a string object
Convert the string to bytes means encoding it with encoding like 'utf-8','big5'
and so on,charset would detect the encoding you made....not the original string's encoding
I have no idea with this problem...
The problem is obvious, you're calling chardet
on a string rather than a bytes object. What you're missing is that to Python, a string is already decoded. It doesn't have an encoding anymore.
You must fix your code so that it's giving chardet
the original bytes before they were decoded into a string. If you're getting the string from another package then it has already determined the encoding and there's nothing you can do.