Search code examples
ms-wordopenxmlright-to-left

OpenXml RTL layout


I have an OpenXML file, and i don't fully understand how word draw letters in this document.

<w:p>
            <w:pPr>
                <w:bidi/>
            </w:pPr>
            <w:r>
                <w:t>W</w:t>
            </w:r>
            <w:r>
                <w:rPr>
                    <w:rtl/>
                </w:rPr>
                <w:t>T</w:t>
            </w:r>
            <w:r>
                <w:t>J</w:t>
            </w:r>
        </w:p>

Why result is JTW?

I tried to rewrite layout to

<w:p>
            <w:pPr>
                <w:bidi/>
            </w:pPr>
            <w:r>
                <w:t>W</w:t>
            </w:r>
            <w:r>
                <w:t>T</w:t>
            </w:r>
            <w:r>
                <w:t>J</w:t>
            </w:r>
        </w:p>

and now word show it like WTJ, and i dont understand why...


Solution

  • Dealing with bidirectional text is quite complicated - you only have to read the Unicode algorithm to get the idea. I don't know whether Word's algorithm is identical, but I think it is probably as complicated.

    AIUI the way to see this is that when sequencing text, Word does not process each Run's text independently, but combines Runs that have "the same direction" into chunks. In a bidi paragraph, it then puts the first chunk at the right, the next chunk to the left of that, and so on.

    So let's say you have an RTL paragraph and you enter the text "ABCDEFGHI" in Word using an LTR keyboard/input method. Then the text looks like ABCDEFGHI, and it will probably be represented in the XML in a single run, e.g.

    <w:r><w:t>ABCDEFGHI</w:t></w:r>
    

    But if you select the DEF and make it Bold, Word will need to break up the run into three runs, so you have something like this:

    <w:r><w:t>ABC</w:t></w:r>
    <w:r><w:rPr><w:b/></w:rPr><w:t>DEF</w:t></w:r>
    <w:r><w:t>GHI</w:t></w:r>
    

    But the text you see is still displayed as ABCDEFGHI, not GHIDEFABC.

    Now select the DEF and make it RTL, which you can do by selecting the DEF and using the the VB Editor's Immediate mode to execute

    Selection.RtlRun
    

    Now you see that the paragraph looks like GHIDEFABC, and the XML looks more like

    <w:r><w:t>ABC</w:t></w:r>
    <w:r><w:rPr><w:b/><w:rtl/></w:rPr><w:t>DEF</w:t></w:r>
    <w:r><w:t>GHI</w:t></w:r>
    

    i.e. a similar structure to your first example.

    What you do not see is the DEF reversed, i.e. you do not see GHIFEDABC, and that is because the Latin alphabetic letters A-Z etc. are "strongly" LTR, so Word still lays the run out in LTR direction despite the <w:rtl> element. But the appearance of the DEF does change, because Word also marks its run with (in this example) <w:bCs/> , i.e. Bold Complex Script, and that, in combination with the <w:rtl/> element, causes Word to choose different font to display the run. (That would probably depend on the details of the complex script settings of the style you are using).