I tried this to loop through the characters in my string and print them. All of them are printing fine except the Deseret Long I (𐐀). I have no idea if there are other ways to do this so that the 𐐀 is printed correctly. Here is my code:
package javaapplication13;
public class JavaApplication13 {
public static void main(String[] args) {
String s = "h𤍡y𐐀\u0500";
System.out.println(s);
final int length = s.length();
for (int offset = 0; offset < length;) {
final int codepoint = s.codePointAt(offset);
System.out.println((char) (codepoint));
offset += Character.charCount(codepoint);
}
}
}
The output looks like this (Netbeans):
run:
h𤍡y𐐀Ԁ
h
䍡
y
Ѐ
Ԁ
BUILD SUCCESSFUL (total time: 0 seconds)
Your problem is caused by the fact that you try to convert int
to char
(4 bytes to 2 bytes). The value in the codepoint
variable cannot fit in one char
in case of surrogate pair. Look, it is called pair, because it is a pair of chars. I think the simplest way how you can print it is by using String.Substring()
method. Or you can convert it to array of char's this way: char[] ch = Character.toChars(codepoint);
and you can convert this array back to string by simple new String(ch)
.