Search code examples
pythonpython-2.7unicodehebrew

Hebrew unicode in Python


I'm trying to make an English-Hebrew Dictionary.
I have a dictionary in Tab format (<word>TAB<translation>). In the end - I want it to be in mobi format. I've found Python script that convert from tab to opf (and htmls). From there it's easy to convert to mobi. The Python script called tab2opf.py.

When I'm using the original file with my tab(.txt) file - everything is fine.
I'm using the script with the built-in utf option: tab2opf.py -utf tab.txt

The problem is that I want the dictionary for my Kindle. The Kindle shows the Hebrew translation backward. So I decided to edit the tab2opf file so he would reverse the translation - and in the kindle it will be shown correctly.

I wrote the following code:

def RevIt(s):
heb = []
g = ""
for i in range(len(s)):
    c = s[i]
    heb.append(c)
for i in range(len(heb)):
g += heb.pop()
return g

and in the tab2opf.py I added after line 245 dd = RevIt(dd).
Now I recieve mess:
"-բ לימלՠ£ילחתכՠ©משמהՠתՠאՠמיסՠ,)¨וביחՠמיסՠ:תכבը נסרפמאՠ,& .צעՠשՠ
For comparsion, this is how the same line in the original txt file looks like:
שם עצם. &, אמפרסנד (בכתב: סימן חיבור), סימן או תו המשמש כתחליף למילה "ו-"

What am I doing wrong?


Solution

  • You're working with bytes instead of Unicode characters. Try this:

    g = u""
    s = s.decode('UTF-8')