Search code examples
pythonphpinternationalizationicutransliteration

Python transliterator that holds the same rules as PHP one


I need a transliterator for python that will be configured the same way as PHP one. My PHP based transliterator is configured with these rules:

$transliterator = Transliterator::createFromRules(
    ':: NFD;'
    . ' :: [:Nonspacing Mark:] Remove;'
    . ' :: NFC;'
    . ' :: [:Punctuation:] Remove;'
    . ' :: Lower();',
    Transliterator::FORWARD
);

At this moment I am using slugify library for python so that I can achieve a close enough result. This duality causes that cross-dependent (between php and python) transliterated texts must be done in PHP's site back-end by using an API endpoint that will return transliterated string.

Is there any way to achieve this?


Solution

  • Use PyICU a Python wrapper around icu4c.

    Assuming you already have icu4c installed and accessible to Python, install PyICU:

    pip install -U PyICU
    

    Syntax is virtually identical between PyICU and PHP. The only real difference is that you need to add a label for the transliterator:

    icu.Transliterator.createFromRules(label, rules, direction)

    So:

    import icu
    rules = (
        ':: NFD;'
        ' :: [:Nonspacing Mark:] Remove;'
        ' :: NFC;'
        ' :: [:Punctuation:] Remove;'
        ' :: Lower();'
    )
    direction = icu.UTransDirection.FORWARD
    transliterator = icu.Transliterator.createFromRules("customClean", rules, direction)
    s = "Nāgārjuna!"
    print(transliterator.transliterate(s))
    # nagarjuna
    

    Likewise PyICU will have equivalent functionality to PHP's intl.