Search code examples
cutf-8asn.1decoderutf8-decode

ASN1 UTF-8 string Decoding


I am working to make an ASN.1 parser in the C language (using the Ericsson ASN1 specification document). I want to decode the UTF-8 string type but I can't find information about this online, and the document I'm using does not describe UTF-8 string in detail. Can anybody provide me with some code, or explain how to decode it.

I am new to ASN.1.


Solution

  • If you're trying to parse ASN.1, then an excellent introductory resource is Kaliski's ‘Layman’s Guide’ (available at various places on the web, in HTML and PDF). However that document doesn't mention the UTF8String type.

    The extra information you need to know is that UTF8String has tag 12 (decimal, or 0c in hex), and that it's encoded as a sequence of the bytes representing the string in the UTF-8 encoding.

    Thus the string ‘Helló’ would be encoded as

    0c 06 48 65 6c 6c c3 b3
    

    (I'm presuming, by the way, that ‘Ericsson ASN1 specification document’ discusses the standard ASN.1, and not some variant.)