Search code examples
javaunicode

Checking if string is Encodeable In Gsm0338


I am working on an SMS application and I send out Unicode characters (Amharic/G'eez). I am using this example. The method on line 240, isEncodeableInGsm0338(), is used to check if I should use another encoding or the default encoding.

Here is the code

 public static boolean isEncodeableInGsm0338(String isoString) {
    byte[] isoBytes = isoString.getBytes();
    outer:
    for (int i = 0; i < isoBytes.length; i++) {
        for (int j = 0; j < isoGsm0338Array.length; j++) {
            if (isoGsm0338Array[j] == isoBytes[i]) {
                continue outer;
            }
        }
        for (int j = 0; j < extendedIsoGsm0338Array.length; j++) {
            if (extendedIsoGsm0338Array[j][1] == isoBytes[i]) {
                continue outer;
            }
        }
        return false;
    }
    return true;
}

Here is the catch. The string "የእንግሊዝ ፕሪምየር ሊግ ነህሴ 6 ይጀምራል።", which is clearly Unicode, is returning from that method. My hypothesis being half of each letter. But I can't support that theory. If I change the text to "1. የእንግሊዝ ፕሪምየር ሊግ ነህሴ 6 ይጀምራል።", it detects correctly.

What is happening here?


Solution

  • Got it!

    The method on line 240 is as follows.

     public static boolean isEncodeableInGsm0338(String isoString) {
            byte[] isoBytes = isoString.getBytes();
            outer:
            for (int i = 0; i < isoBytes.length; i++) {
                for (int j = 0; j < isoGsm0338Array.length; j++) {
                    if (isoGsm0338Array[j] == isoBytes[i]) {
                        continue outer;
                    }
                }
                for (int j = 0; j < extendedIsoGsm0338Array.length; j++) {
                    if (extendedIsoGsm0338Array[j][1] == isoBytes[i]) {
                        continue outer;
                    }
                }
                return false;
            }
            return true;
        }
    

    As you can see, it usesisoString.getBytes() which is encoding dependent. The solution is to compare each char by getting the character array using isoString.toCharArray().

    Changed

    byte[] isoBytes = isoString.getBytes();

    to

    char[] isoBytes = isoString.toCharArray();

    You might want to name isoBytes to something else too. Works like a charm.