I need a transliterator for python that will be configured the same way as PHP one. My PHP based transliterator is configured with these rules:
$transliterator = Transliterator::createFromRules(
':: NFD;'
. ' :: [:Nonspacing Mark:] Remove;'
. ' :: NFC;'
. ' :: [:Punctuation:] Remove;'
. ' :: Lower();',
Transliterator::FORWARD
);
At this moment I am using slugify
library for python so that I can achieve a close enough result. This duality causes that cross-dependent (between php and python) transliterated texts must be done in PHP's site back-end by using an API endpoint that will return transliterated string.
Is there any way to achieve this?
Use PyICU
a Python wrapper around icu4c.
Assuming you already have icu4c installed and accessible to Python, install PyICU:
pip install -U PyICU
Syntax is virtually identical between PyICU and PHP. The only real difference is that you need to add a label for the transliterator:
icu.Transliterator.createFromRules(label, rules, direction)
So:
import icu
rules = (
':: NFD;'
' :: [:Nonspacing Mark:] Remove;'
' :: NFC;'
' :: [:Punctuation:] Remove;'
' :: Lower();'
)
direction = icu.UTransDirection.FORWARD
transliterator = icu.Transliterator.createFromRules("customClean", rules, direction)
s = "Nāgārjuna!"
print(transliterator.transliterate(s))
# nagarjuna
Likewise PyICU will have equivalent functionality to PHP's intl.