Search code examples
javastringunicodeconcatenation

Java String wrong order concatenation of different languages


Screen Shoot of strange java String behavior

So, as you can see from the image, that i have concatenated a,c and b. And i am getting the result i expected. But in 2nd println, when i concatenated a,e and b, i got e concatenated in the end, not where i was expecting it to be. I want to know, the reason of this behavior and solution to this behavior. Thank you in advance.

import java.util.*;
public class prob 
{
    public static void main(String... args)
    {
        String a="الف",b="1/2",c="ب",e="B";

        System.out.println(a+" : "+c+" : "+b);
        System.out.println(a+" : "+e+" : "+b);
    }
}

EDIT(To explain why my question is not a duplicate): My question is on converting L2R languages to R2L.


Solution

  • This is because the first character is R2L (right to left orientation as in asian languages), so next character becames at the begining (correct orientation):

    First char:

    الف 
    // actual orientation ←
    

    Second char added at L

    // add ←
    B : الف 
    // actual orientation →
    

    After this, B is L2R as usual in Europe, so next char (1/2) is added in the right orientation AFTER B:

    // → add in this direction
    B : 1/2 : الف 
    // actual orientation → (still)
    

    You can easily test it by copy paste char and writting manually another, you will see how orientation changes depending of the char you inserted.


    UPDATE:

    what is my solution for this issue, because i made this example only to show what issue i was facing in making some big reports, where data is mix sometimes, it is L2R String and sometimes R2L. And i want to make a string in strictly this format.(

    From this answer:

    • Left-to-right embedding (U+202A)
    • Right-to-left embedding (U+202B)
    • Pop directional formatting (U+202C)

    So in java, to embed a RTL language like Arabic in an LTR language like English, you would do

    myEnglishString + "\u202B" + myArabicString + "\u202C" + moreEnglish
    

    and to do the reverse

    myArabicString + "\u202A" + myEnglishString + "\u202C" + moreArabic
    

    See (for the source material)


    ADD ON 2:

    char l2R = '\u202A';
    System.out.println(l2R + a + " : " + e +" : "+b);
    

    OUTPUT:

    ‪الف : B : 1/2