Search code examples
unicodeinternationalizationright-to-leftleft-to-right

How do I embed arbitrary unicode without messing up the rest of the line?


So we have header|sequence|string1|string2|directive where string1 and string2 are arbitrary Unicode junk. Assuming the input can be really trashy Unicode (I'm expecting for it to contain things like right-to-left text, unbalanced Unicode direction control characters, etc) but not actually malicious, how can I get these strings to display in order?

The final website target is HTML but we believe it's best to process as string as far as possible. Blindly jamming a force-LTR before each | is not remotely acceptable as this tends to carry into the text past the | and cause RTL to render as LTR.

First step: replace control codes with control pictures

Second step: fix RTL nonsense ??

I have to admit I was expecting the RTL stack to be simpler than it was. I cannot simply run the algorithm because there's no way to know the RTL-LTR-ness of a private use character.


Solution

  • We ended up with this kludgy method. It works. (Note that in the production code these inline styles turn into a class reference.)

    <PRE><DIV DIR=LTR STYLE="display:inline-block;">|</DIV><DIV STYLE="display:inline-block;">something1</DIV><DIV DIR=LTR STYLE="display:inline-block;">|</DIV><DIV STYLE="display:inline-block;">something2</DIV><DIV DIR=LTR STYLE="display:inline-block;">|</DIV></PRE>