Search code examples
encodingsmsdecodingsmpp

Meaning of "data_coding" field in SMPP


What is the meaning of "data_coding" field in the SMPP protocol?

I searched for this but couldn't find any helpful resource.


Solution

  • In short, datacoding contains the information on how the text in an smpp SubmitSM (i.e. a typical SMS message) message is encoded. The SubmitSM packet contains a binary encoded body, and the dataCoding is how the text is stored in this body.

    The most important values are:

    • 00000000 (0) - usually GSM7 (the default 7 bit encoding for messages, with a few characters that are encoded as two bytes), but technically could be something else
    • 00000011 (3) for standard ISO-8859-1
    • 00001000 (8) for the universal character set -- de facto UTF-16

    Other possible values (rarely used):

    • 00000001 - IA5_CCITT_T_50_ASCII_ANSI_X3_4
    • 00000010 - OCTET_UNSPECIFIED_8BIT_BINARY_1
    • 00000100 - OCTET_UNSPECIFIED_8BIT_BINARY_2
    • 00000101 - JIS_X_02081990
    • 00000110 - CYRLLIC_ISO88595
    • 00000111 - LATIN_HEBREW_ISO88598
    • 00001001 - PICTOGRAM_ENCODING
    • 00001010 - ISO2022JP_MUSIC_CODES
    • 00001101 - EXTENDED_KANJI_JISX_02121990
    • 00001110 - KS_C_5601

    And two reserved for special uses:

    • 00001011 - RESERVED #1
    • 00001100 - RESERVED #2

    In short, if your binary body is unicode (UTF-16) you will set dataCoding to 8. If your message is stored as GSM7 then it will (usually) be 0.