Search code examples
pythontextunicode

How to convert encoded Preeti font text to unicode leaving English words as it is in Python?


I am trying to convert Preeti font text to unicode. I am using Python and npTTF2UTF library. It works most of the time but it also converts English words like 'Microwave' into unicode equivalent. How can I avoid that?

Here is my code:

import npttf2utf

mapper = npttf2utf.FontMapper("npttf2utf-main/src/npttf2utf/map.json")

text = ''' cGo g]6js{;Fu cfj4 x"g] :yfg, tl/sf / lsl;d like (Microwave/Satellite/Cable etc.)  '''

converted_text = mapper.map_to_unicode(word, from_font="Preeti", unescape_html_input=False, escape_html_output=False)  

print(converted_text)

I get: अन्य नेटवर्कसँग आवद्ध हूने स्थान, तरिका र किसिम ष्पिभ ९ःष्अचयधबखभरक्बतभििष्तभरऋबदभि भतअ।०

I don't want the text after 'lsl;d'(किसिम) converted into unicode. How can I do so?


Solution

  • Extract the text together with it's font information (PyMuPDF has this feature) and convert only parts which are set to have the Preeti font.