python fonts python-imaging-library arabic

How to force Arabic characters to be separate?

I'm trying to type a set of arabic characters without space on an image using pillow. The problem I'm currently having is that some arabic characters when get next to each other, appear differently when they are seperate.((e.g. س and ‍ل will be ‍سل when put next to each other.) I'm trying to somehow force my font settings to always seperate all characters without injection of any other characters, what should I do?

Here is a snippet of my code:

#font is an arabic font, and font_path is pointing to that location.
        font = ImageFont.truetype(
            font=font_path, size=size,
            layout_engine=ImageFont.LAYOUT_RAQM)

        h, w = font.getsize(text, direction='rtl')
        offset = font.getoffset(text)
        H, W = int(1.5 * h), int(1.5 * w)
        imgSize = H, W
        img = Image.new(mode='1', size=imgSize, color=0)
        draw = ImageDraw.Draw(img)
        pos = ((H-h)/2, (W-w)/2)
        draw.text(pos, text, fill=255, font=font,
                  direction='rtl', align='center')

Solution

What you're describing might be possible with some fonts that support Arabic, specifically, those that encode the position-sensitive forms in the Arabic Presentation Forms-B Block of Unicode. You would need to map your input text character codes into the correct positional variant. So for the example characters seen and lam as you described, U+0633 س‎ and U+0644 ل‎, you want the initial form of U+0633, which is U+FEB3 ﺳ‎‎, and the final form of U+0644, which is U+FEDE ﻞ, putting those together (separated by a regular space): ﺳ‌ ﻞ‌.

There is a useful chart showing the positional forms at https://en.wikipedia.org/wiki/Arabic_script_in_Unicode#Contextual_forms.

But, important to understand:

not all fonts that contain Arabic have the Presentation Forms encoded (many fonts do not)
not all Arabic codes have an equivalent in the Presentation Forms range (most of the basic ones do, but there are some extended Arabic characters for other languages that do not have Presentation Forms).
you are responsible for processing your input text (in the U+06xx range) into the correct presentation form (U+FExx range) codes based on the word/group context, which can be tricky. That job normally falls to an OpenType Layout engine, but it also performs the joining. So you're basically overriding that logic.