I get the following error while trying to search the string below
ERROR:
SyntaxError: Non-ASCII character '\xd8' in file Hadith_scraper.py on line 44, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
STRING:
دَّثَنَا عَبْدَانُ، قَالَ أَخْبَرَنَا عَبْ
CODE:
arabic_hadith = "دَّثَنَا عَبْدَانُ، قَالَ أَخْبَرَنَا عَبْ"
arabic_hadith.encode('utf8')
print arabic_hadith
if "الجمعة" in arabic_hadith:
day = "5"
else:
day = ""
You have a byte string, not a unicode
value. Trying to encode a byte string in Python 2 means that Python will first try to decode it to unicode
so that it can then encode.
Use unicode
values here instead, and make sure you set the codec at the top of the file first. See PEP 263 - Defining Python Source Code Encodings (which your error message pointed you to).
Note that there is no need to encode to UTF8 here, that'll only complicate text comparisons:
# encoding: utf8
arabic_hadith = u"دَّثَنَا عَبْدَانُ، قَالَ أَخْبَرَنَا عَبْ"
print arabic_hadith
if u"الجمعة" in arabic_hadith:
day = "5"
else:
day = ""
Rule of thumb: decode bytes from incoming sources (files, network data) to Unicode, process only Unicode in your program, and only encode again for any outgoing data.
I urge you to read:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
Pragmatic Unicode by Ned Batchelder
before you continue.