Why while I loop over every single char of this .NET C# string Arabic text: ٻڠڣڟگگښڏ
at position 13th I get the wrong letter? 'ٻ' instead of 'ڏ'.
How do I fix it?
Arabic is written right-to-left. The arrow points to the character at offset 20.
You're pointing to the last
0: U+0041 LATIN CAPITAL LETTER A
1: U+0072 LATIN SMALL LETTER R
2: U+0061 LATIN SMALL LETTER A
3: U+0062 LATIN SMALL LETTER B
4: U+0069 LATIN SMALL LETTER I
5: U+0063 LATIN SMALL LETTER C
6: U+0020 SPACE
7: U+0074 LATIN SMALL LETTER T
8: U+0065 LATIN SMALL LETTER E
9: U+0078 LATIN SMALL LETTER X
10: U+0074 LATIN SMALL LETTER T
11: U+003A COLON
12: U+0020 SPACE
13: U+067B ARABIC LETTER BEEH
14: U+06A0 ARABIC LETTER AIN WITH THREE DOTS ABOVE
15: U+06A3 ARABIC LETTER FEH WITH DOT BELOW
16: U+069F ARABIC LETTER TAH WITH THREE DOTS ABOVE
17: U+06AF ARABIC LETTER GAF
18: U+06AF ARABIC LETTER GAF
19: U+069A ARABIC LETTER SEEN WITH DOT BELOW AND DOT ABOVE
20: U+068F ARABIC LETTER DAL WITH THREE DOTS ABOVE DOWNWARDS
And that's not going into the fact that a grapheme (visual element) can be composed from multiple Unicode Code Points, and that C# uses surrogate pairs and thus multiple char
values to represent some Unicode Code Points.
For example, there exists a script where the following grapheme exists:1
So the grapheme would be represented by the following sequence of four char
values!
And no, it's not just for archaic languages. "😀" is used daily by many people.