Search code examples
python-3.xcharacter-encoding

How can I convert ISCII encoding to unicode for Gujarati language in Python 3?


I have some Gujarati string but its in ISCII encoding, so python throughing error (SyntaxError: invalid decimal literal).

string = TFH[TZDF\ I]GF.8[0 G[Xg;
line 1
    string = TFH[TZDF\ I]GF.8[0 G[Xg;
                      ^
SyntaxError: unexpected character after line continuation character

I was tried byte encoding too, but its not giving output like ISCII encoding.

I am trying this to make ISCII into unicode for Gujarati language. I have ISCII based font and character map data also.

ISCII input string: TFH[TZDF\ I]GF.8[0 G[Xg; Desired unicode output: તાજેતરમાં યુનાઇટેડ નેશન્સ (Typed using gujarati phonetic keyboard)


Solution

  • If you just want to write the string literal, for me, just writing print("તાજેતરમાં યુનાઇટેડ નેશન્સ") worked. Or you could write:

    characters = [2724, 2750, 2716, 2759, 2724, 2736, 2734, 2750, 2690, 32, 2735, 2753, 2728, 2750, 2695, 2719, 2759, 2721, 32, 2728, 2759, 2742, 2728, 2765, 2744]
    string = str()
    for c in characters:
            string += chr(c)
    

    Maybe you have a look at this conversion script: https://gist.github.com/pathumego/81672787807c23f19518c622d9e7ebb8