I am working on an SMS application and I send out Unicode characters (Amharic/G'eez). I am using this example. The method on line 240, isEncodeableInGsm0338()
, is used to check if I should use another encoding or the default encoding.
Here is the code
public static boolean isEncodeableInGsm0338(String isoString) {
byte[] isoBytes = isoString.getBytes();
outer:
for (int i = 0; i < isoBytes.length; i++) {
for (int j = 0; j < isoGsm0338Array.length; j++) {
if (isoGsm0338Array[j] == isoBytes[i]) {
continue outer;
}
}
for (int j = 0; j < extendedIsoGsm0338Array.length; j++) {
if (extendedIsoGsm0338Array[j][1] == isoBytes[i]) {
continue outer;
}
}
return false;
}
return true;
}
Here is the catch. The string "የእንግሊዝ ፕሪምየር ሊግ ነህሴ 6 ይጀምራል።"
, which is clearly Unicode, is returning from that method. My hypothesis being half of each letter. But I can't support that theory. If I change the text to "1. የእንግሊዝ ፕሪምየር ሊግ ነህሴ 6 ይጀምራል።"
, it detects correctly.
What is happening here?
Got it!
The method on line 240 is as follows.
public static boolean isEncodeableInGsm0338(String isoString) {
byte[] isoBytes = isoString.getBytes();
outer:
for (int i = 0; i < isoBytes.length; i++) {
for (int j = 0; j < isoGsm0338Array.length; j++) {
if (isoGsm0338Array[j] == isoBytes[i]) {
continue outer;
}
}
for (int j = 0; j < extendedIsoGsm0338Array.length; j++) {
if (extendedIsoGsm0338Array[j][1] == isoBytes[i]) {
continue outer;
}
}
return false;
}
return true;
}
As you can see, it usesisoString.getBytes()
which is encoding dependent. The solution is to compare each char
by getting the character array using isoString.toCharArray()
.
Changed
byte[] isoBytes = isoString.getBytes();
to
char[] isoBytes = isoString.toCharArray();
You might want to name isoBytes
to something else too. Works like a charm.