Search code examples
c#unicodehexncr

Convert string with unicode characters to hexadecimal with a specific format in c#


I am working with a third party for sending SMS messages. The information is sent to them via HTTPS, in a SOAP envelope.

In case the text contains Unicode characters, the DataCoding needs to be set to 2 (which is Unicode) and the text, from what I understand, needs to be sent in a certain format. This format is a NCR (numeric character reference) in hexadecimal with the "&#xHHHH;" format for each unicode character.

I am not very familiar with SOAP and this format, how can that conversion be done on a string containing both Ascii and Unicode characters?


Solution

  • The following might solve your encoding task:

    using System;
    using System.Text;
    
    public class Test
    {
        public static void Main()
        {
            string encoded = 
                ncrEncode("My string with äöüÄÖÜß");
                
            Console.WriteLine(encoded);
        }
        
        public static string ncrEncode(string s)
        {
            StringBuilder sb = new();
            
            foreach(char c in s)
            {
                uint u = (uint)c;
                if (u > 127)
                {
                    sb.Append($"&#x{u.ToString("X4")}");
                }
                else
                {
                    sb.Append(c);
                }
            }
            
            return sb.ToString();
        }
    
        public static string ncrEncodeLinq(string s)
        {
           return string.Join("", 
           s.Select(c => (uint)c < 128 ? $"{c}" : $"&#x{(uint)c:X4}"));
        }
    }
    

    The string to be encoded is looped through character by character. Characters with codes bigger than 127 are appended as hex string.

    For the fun of it, I have sequeezed the function into a one-liner.