Search code examples
androidencodingescapinggsmsms-gateway

Escape Codes in GSM 03.38 Encoding Scheme


I am working on an android sms gateway project which receives sms from client app from different countries. In the client app I use {, }, [ and ] for identifying the beginning and ending of some portion of the message which is sent as SMS to the gateway. As given in this link, these characters are special characters that are escaped. Sometimes the sms received by the gateway cannot decode the escaped characters and it would be displayed as _(, _), _< and _> respectively. This can be handled in the following ways.

  1. One way to handle this is to change the encoding mechanism in the client app and use only the default character set.
  2. Other way is to deal with the escape character in the gateway.

I don't want to take option 1 as the solution would not be backward compatible and supporting the product could become difficult.

I would like to take option 2, but I'm not sure how exactly I can do it. In particular, I'm not sure what are the different escape characters that could be used(_ is one of them) and how can a escape code be detected.

This problem is seen mostly in cross country sms'es.

Any help is highly appreciated.


Solution

  • You have a couple of options:

    1. Use the Android SMS decoding system (it is relatively good, not perfect but good enough)

    2. Following the GSM specifications you can do the following decoding:

      Whilst decoding each character you must verify whether the character is part of the "default character set" or if it's an escape to the "extended character set". The escape character is 0x1B this tells your decoder then that the next character must be from the "extended character set". If your decoder does not find this character to be in the "extended character set" then by GSM specification it must first attempt to find this character in the "default character set". If it is also not in the "default character set" then a space character must be used.

    For example:

    if (escaped) {
      char ext = (char) GSMUtils.BYTE_TO_CHAR_ESCAPED[val];
      // if no character defined then do fall back
      ext = ext != -1 ? ext : (char) GSMUtils.BYTE_TO_CHAR[val];
      // if no character defined then fall back to <space> 
      return ext != -1 ? ext : ' '; 
    } else {
      char ch = (char) GSMUtils.BYTE_TO_CHAR[val];
      // if no character defined then fall back to <space>
      return ch != -1 ? ch : ' '; 
    }
    

    Where GSMUtils.BYTE_TO_CHAR_ESCAPED and GSMUtils.BYTE_TO_CHAR are int[]'s

    private static final int[] BYTE_TO_CHAR = {
        0x0040, 0x00A3, 0x0024, 0x00A5, 0x00E8, 0x00E9, 0x00F9, 0x00EC,
        0x00F2, 0x00E7, 0x000A, 0x00D8, 0x00F8, 0x000D, 0x00C5, 0x00E5,
        0x0394, 0x005F, 0x03A6, 0x0393, 0x039B, 0x03A9, 0x03A0, 0x03A8,
        0x03A3, 0x0398, 0x039E, 0x00A0, 0x00C6, 0x00E6, 0x00DF, 0x00C9,
        0x0020, 0x0021, 0x0022, 0x0023, 0x00A4, 0x0025, 0x0026, 0x0027,
        0x0028, 0x0029, 0x002A, 0x002B, 0x002C, 0x002D, 0x002E, 0x002F,
        0x0030, 0x0031, 0x0032, 0x0033, 0x0034, 0x0035, 0x0036, 0x0037,
        0x0038, 0x0039, 0x003A, 0x003B, 0x003C, 0x003D, 0x003E, 0x003F,
        0x00A1, 0x0041, 0x0042, 0x0043, 0x0044, 0x0045, 0x0046, 0x0047,
        0x0048, 0x0049, 0x004A, 0x004B, 0x004C, 0x004D, 0x004E, 0x004F,
        0x0050, 0x0051, 0x0052, 0x0053, 0x0054, 0x0055, 0x0056, 0x0057,
        0x0058, 0x0059, 0x005A, 0x00C4, 0x00D6, 0x00D1, 0x00DC, 0x00A7,
        0x00BF, 0x0061, 0x0062, 0x0063, 0x0064, 0x0065, 0x0066, 0x0067,
        0x0068, 0x0069, 0x006A, 0x006B, 0x006C, 0x006D, 0x006E, 0x006F,
        0x0070, 0x0071, 0x0072, 0x0073, 0x0074, 0x0075, 0x0076, 0x0077,
        0x0078, 0x0079, 0x007A, 0x00E4, 0x00F6, 0x00F1, 0x00FC, 0x00E0,
            -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
            -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
            -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
            -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
            -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
            -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
            -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
            -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
            -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
            -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
            -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
            -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
            -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
            -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
            -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
            -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
    };
    

    and

    private static final int[] BYTE_TO_CHAR_ESCAPED = {
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1, 0x000C,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1, 0x005E,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
    0x007B, 0x007D,     -1,     -1,     -1,     -1,     -1, 0x005C,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1, 0x005B, 0x007E, 0x005D,     -1,
    0x007C,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     0x20AC, -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
        -1,     -1,     -1,     -1,     -1,     -1,     -1,     -1,
    };
    

    I might add that your decoding issues would not necessarily be just a cross-country issue. As there are a number of characters in the extended character set that are often used in national sms's.

    Here is the root source of information from 3GPP specifying how to handle the default/extended character sets. Pay special attention to the notes under each of the tables!!