Search code examples
c#json.netxmljson.net

Newtonsoft.Json DeserializeXmlNode changes tag name from number to "something"


We have a long living app that uses some feed that used to be xml, but was converted to json... Of course we were "to lazy" to change parser from reading XmlDocument to read JObject or other so we used "DeserializeXmlNode" to convert from json txt to XmlDocument. All was fine for a long long time... until we updated from Newtonsoft.Json versions 4.5 and 6.0 to version 12.0.x and suddenly we started to have some problems...

let's say json looks like this:

{"version":"2.0","result":[{"mainobid":"123","typeId":"2","subobjects":{"1":{"data":"data"},"2":{"data":"data"}}}]}

what we used to get was xml having

<1><data>data</data></1><2><data>data</data></2>

tags

now... instead of <1> tag we get something like <x0031> instead of 10 there's _x0031_0 instead of 45 there's 0x0034_5 and instead of 100 _x0031_00

Can I turn that off somehow? or am I forced now to change parsing to decode that sick x003.... thing?

INB4 1: I realize that having 1: and <1> is not the thing that anyone sane wishes to have, but i can't change that, it's external feed

INB4 2: I know we should change parsing from xml to json, but as above - some lazines and re-using old code that was working 100% good.

EDIT:

private static void TestOldNewton()
{
    var jsonstr = "{\"version\":\"2.0\",\"result\":[{\"mainobid\":\"123\",\"typeId\":\"2\",\"subobjects\":{\"1\":{\"data\":\"data\"},\"2\":{\"data\":\"data\"}}}]}";
    var doc = Newtonsoft.Json.JsonConvert.DeserializeXmlNode(jsonstr, "data");
    Console.WriteLine(doc.OuterXml);
    Console.ReadKey();
}

using packages.config like:

<?xml version="1.0" encoding="utf-8"?>
<packages>
  <package id="Newtonsoft.Json" version="6.0.1" targetFramework="net48" />
</packages>

and receiving output:

<data><version>2.0</version><result><mainobid>123</mainobid><typeId>2</typeId><subobjects><1><data>data</data></1><2><data>data</data></2></subobjects></result></data>

freshly compiled and run on new, testing project.


Solution

  • The cause of the change is the following checkin: Fixed converting JSON to XML with invalid XML name characters to Json.NET 8.0.1. This checkin added (among other changes) calls to XmlConvert.EncodeName() inside XmlNodeConverter.CreateElement():

    private IXmlElement CreateElement(string elementName, IXmlDocument document, string? elementPrefix, XmlNamespaceManager manager)
    {
        string encodeName = EncodeSpecialCharacters ? XmlConvert.EncodeLocalName(elementName) : XmlConvert.EncodeName(elementName);
        string ns = StringUtils.IsNullOrEmpty(elementPrefix) ? manager.DefaultNamespace : manager.LookupNamespace(elementPrefix);
    
       IXmlElement element = (!StringUtils.IsNullOrEmpty(ns)) ? document.CreateElement(encodeName, ns) : document.CreateElement(encodeName);
    
        return element;
    }
    

    This was done to [add] support for converting JSON to XML with invalid XML name characters. This applies here because element names beginning with numerals such as <1> are not well-formed XML element names, as explained in XML tagname starting with number is not working. And in fact the XML you were previously generating was not, strictly speaking, well-formed XML.

    As you can see from the code excerpt above, there doesn't seem to be a way to disable this change and create elements names without encoding them.

    As a workaround, since you want to create elements with numeric names like <1> anyway, you could subclass XmlTextWriter and decode the names as they are written by calling XmlConvert.DecodeName()

    This method does the reverse of the EncodeName(String) and EncodeLocalName(String) methods.

    First define the following class:

    public class NameEditingXmlTextWriter : XmlTextWriter
    {
        readonly Func<string, string, string> nameEditor;
    
        public NameEditingXmlTextWriter(TextWriter writer, Func<string, string, string> nameEditor)
            : base(writer)
        {
            this.nameEditor = nameEditor;
        }
    
        public override void WriteStartElement(string prefix, string localName, string ns)
        {
            var newLocalName = nameEditor(localName, ns);
            base.WriteStartElement(prefix, newLocalName, ns);
        }
    }
    

    Then use it as follows:

    var doc = Newtonsoft.Json.JsonConvert.DeserializeXmlNode(jsonstr, "root");
    
    var sb = new StringBuilder();
    using (var textWriter = new StringWriter(sb))
    using (var writer = new NameEditingXmlTextWriter(textWriter, (n, ns) => XmlConvert.DecodeName(n)))
    {
        doc.WriteTo(writer);
    }
    var outerXml = sb.ToString();
    

    Notes:

    • You must subclass the deprecated XmlTextWriter instead of its replacement XmlWriter because XmlWriter will throw an exception on an attempt to write a malformed XML element name such as <1>.

    • As an alternative, since Json.NET is currently licensed under the MIT License, you could fork your own version of XmlNodeConverter and remove the calls to XmlConvert.EncodeName() from CreateElement(). However, this solution seems less desirable as it creates a maintenance requirement to keep your forked version up-to-date with Newtonsoft's version.

    Demo fiddle here.