Search code examples
pythonflaskarabicarabic-supportbidi

Python arabic text returns in right-to-left orientation instead of left-to-right


I'm working on a python project with Python(3.6) and Flask in which I have to return a text in Arabic. When I print the text in the console it works well but when I return it as response it's order changes to right-to-left.

Here's what I have tried:

from odoa import ODOA
import arabic_reshaper
from bidi.algorithm import get_display
from flask import Flask
import json

app = Flask(__name__)
app.config['JSON_AS_ASCII'] = False

@app.route('/', methods=['GET'])
def get_an_ayah():
    odoa = ODOA()
    surah = odoa.get_random_surah(lang='en')
    text = surah.ayah.decode("utf-8")
    reshaped_text = arabic_reshaper.reshape(text)    # correct its shape
    arabic_text = get_display(reshaped_text, base_dir='R')
    print(arabic_text)
    translation = str(surah.translate)
    sound_file_url = str(surah.sound)
    reference = str(str(surah.surah_number) + ':' + str(surah.ayah_number))
    response_dict = {
        'text': arabic_text,
        'translation': translation,
        'sound': sound_file_url,
        'ref': reference
    }

    return response_dict

result for print(arabix_text is:

enter image description here

and here's how its in response:

{
    "ref": "94:2",
    "sound": "https://raw.githubusercontent.com/semarketir/quranjson/master/source/audio/094/002.mp3",
    "text": "ﻙﺭﺯﻭ ﻚﻨﻋ ﺎﻨﻌﺿﻭﻭ",
    "translation": "And lift from you your burden."
}

how can I get the correct orientation for Arabic text?


Solution

  • Do you really need arabic_reshaper and python-bidi ?

    Try the following code instead:

    from odoa import ODOA
    from flask import Flask
    import json
    
    app = Flask(__name__)
    app.config['JSON_AS_ASCII'] = False
    
    @app.route('/', methods=['GET'])
    def get_an_ayah():
        odoa = ODOA()
        surah = odoa.get_random_surah(lang='en')
        text = surah.ayah.decode("utf-8")
        translation = str(surah.translate)
        sound_file_url = str(surah.sound)
        reference = str(str(surah.surah_number) + ':' + str(surah.ayah_number))
        response_dict = {
            'text': text,
            'translation': translation,
            'sound': sound_file_url,
            'ref': reference
        }
    
        return response_dict
    

    The output text will be with Arabic diacritics and Quranic annotation. If you want to remove them, use the following regex replacement:

    import re
    
    PATTERN = re.compile(
        '['
        '\u0610-\u061a'
        '\u064b-\u065f'
        '\u0670'
        '\u06d6-\u06dc'
        '\u06df-\u06e8'
        '\u06ea-\u06ed'
        '\u08d4-\u08e1'
        '\u08d4-\u08ed'
        '\u08e3-\u08ff'
        ']',
    )
    
    letters_only_text = re.sub(PATTERN, '', text)
    

    See Arabic Unicode Chart and Arabic Extended-A Unicode Chart for more info about those replacements.