Search code examples
c#sqlxmlsql-server-2008encode

Proper encoding for Sql XML from C# string


So I ran into this error this morning saving a String to Sql Server 2008 R2 to an XML data type which represents an entire XML document being sent across a Web Service. The save failed on a special character which threw an "illegal xml character" error from Sql. The special character is a bullet point -- •.

I believe that I need to just try to Encode whatever input may come from a user when building the XML in the Web Service which I do through creating an XElement array consuming each of the properties on the object that I want to send across the Web Service. My initial thought was to use the System.Web.HttpUtility.HtmlEncode() method to encode any and all user input to proper HTML thinking that would make Sql happy but it has not; the same error still appears.

I know that Sql Server's XML data type is UTF-16 but I don't have my head wrapped well enough around all this encoding stuff to be able to find a solution yet. Can anybody offer any assistance or point me in the right direction?


Solution

  • System.Web.HttpUtility.HtmlEncode() doesn't escape non-ASCII characters - it relies on whatever character encoding writes your XML String to binary to handle this in a manner that the other end expects. Which obviously isn't happening in your case.

    The SQL Server docs on the XML data format aren't much help as to what the expected behavior is here. If there's an XML preamble (e.g. <?xml version="1.0" encoding="UTF-8"?>) in your String that mentions an encoding, I'd start by removing that and seeing what happens.

    Failing that, you'll need your own method to escape the non-ASCII characters. Here's one I made earlier:

    static public String EscapeNonASCIIChars(String xml)
    {
        StringBuilder sb = new StringBuilder(xml.Length);
        char highSurrogate = '\0';
        foreach (char c in xml)
        {
            if (c < 128)
                sb.Append(c);
            else if (c >= 0xd800 && c <= 0xdbff)
                highSurrogate = c;
            else if (c >= 0xdc00 && c <= 0xdfff)
                sb.Append(string.Format("&#x{0:x};", 0x10000 + ((int) (highSurrogate & 0x3ff) << 10 | (c & 0x3ff))));
            else 
                sb.Append(string.Format("&#x{0:x};", (int) c));
        }
        return sb.ToString();
    }