Search code examples
sql-serverxmlunicodebyte-order-markvarbinary

SqlServer Converting XML to varbinary and parsing it in .NET (C#)


Consider the following code:

[Test]
    public void StackOverflowQuestionTest()
    {
        const string connectionString = "enter your connection string if you wanna test this code";

        byte[] result = null;
        using (var connection = new SqlConnection(connectionString))
        {
            connection.Open();
            using (var sqlCommand = new SqlCommand("declare @xml as xml = '<xml/>' SELECT convert(varbinary(max), @xml) as value"))
            //using (var sqlCommand = new SqlCommand("SELECT convert(varbinary(max), N'<xml/>') as value"))
            {
                sqlCommand.Connection = connection;

                using (SqlDataReader reader = sqlCommand.ExecuteReader())
                {
                    while (reader.Read())
                    {
                        result = (byte[])reader["value"];
                    }

                    reader.Close();
                }
            }
        }

        string decodedString = new UnicodeEncoding(false, true).GetString(result);
        var document = XElement.Parse(decodedString);
    }

If I run this test I get an XmlException with message : "Data at the root level is invalid. Line 1, position 1." As it turns out the problem is "0xFFFE" preamble which is considered as invalid character. Note that if I use commented string instead, everything works just fine, which is strange as per me. Looks like SqlServer stores XML strings in UCS-2 with a BOM, and at the same time it stores nvarchar values without it. The main question is: how can I decode this byte array to string which will not contain this preamble (BOM)?


Solution

  • In case anyone will need this in future, the following code works:

    using(var ms = new MemoryStream(result))
    {
        using (var sr = new StreamReader(ms, Encoding.Unicode, true))
        {
            decodedString = sr.ReadToEnd();
        }
    }