Search code examples
pythonpython-2.7python-unicode

String.maketrans for English and Persian numbers


I have a function like this:

persian_numbers = '۱۲۳۴۵۶۷۸۹۰'
english_numbers = '1234567890'
arabic_numbers  = '١٢٣٤٥٦٧٨٩٠'

english_trans   = string.maketrans(english_numbers, persian_numbers)
arabic_trans    = string.maketrans(arabic_numbers, persian_numbers)

text.translate(english_trans)
text.translate(arabic_trans)

I want it to translate all Arabic and English numbers to Persian. But Python says:

english_translate = string.maketrans(english_numbers, persian_numbers)
ValueError: maketrans arguments must have same length

I tried to encode strings with Unicode utf-8 but I always got some errors! Sometimes the problem is Arabic string instead! Do you know a better solution for this job?

EDIT:

It seems the problem is Unicode characters length in ASCII. An Arabic number like '۱' is two character -- that I find out with ord(). And the length problem starts from here :-(


Solution

  • See unidecode library which converts all strings into UTF8. It is very useful in case of number input in different languages.

    In Python 2:

    >>> from unidecode import unidecode
    >>> a = unidecode(u"۰۱۲۳۴۵۶۷۸۹")
    >>> a
    '0123456789'
    >>> unidecode(a)
    '0123456789'
    

    In Python 3:

    >>> from unidecode import unidecode
    >>> a = unidecode("۰۱۲۳۴۵۶۷۸۹")
    >>> a
    '0123456789'
    >>> unidecode(a)
    '0123456789'