Search code examples
c#traversalcdataxml-deserialization

XmlDeserialize cdata and its siblings


I'm in the process of deserializing into C# objects a custom inflexible XML schema to traverse and migrate the data within.

A brief example:

  <Source>
    ...
    <Provider>
      <![CDATA[read 1]]>
      <Identifier><![CDATA[read 2]]></Identifier>
      <IdentificationScheme><![CDATA[read 3]]></IdentificationScheme>
    </Provider>
    ...
  </Source>

I'm looking the deserialize the Provider element with the first CDATA element value, read 1, and it's sibling element values too, read 2 and read 3.

Using http://xmltocsharp.azurewebsites.net/ it produces the following objects:

[XmlRoot(ElementName = "Provider")]
public class Provider
{
    [XmlElement(ElementName = "Identifier")]
    public string Identifier { get; set; }
    [XmlElement(ElementName = "IdentificationScheme")]
    public string IdentificationScheme { get; set; }
}

[XmlRoot(ElementName = "Source")]
public class Source
{
    [XmlElement(ElementName = "Provider")]
    public Provider Provider { get; set; }
}

But it fails to account for the the CDATA value, in fact I think deserializing it like this the value would not be reachable.

I think this maybe also be related to the XmlDeserializer to use, I was planning on RestSpharp's (as it's a library to the website already) or System.Xml.Link.XDocument, but I'm not sure whether either can handle this scenario?

In my searches I couldn't find an example either, but stack did suggest this <!{CDATA[]]> and <ELEMENT> in a xml element that is precisely the same schema option.

Thanks so much for any help in advance,

EDIT 1 As far as I can tell the [XmlText] is the solution required, as pointed out in Marc Gravell's answer below, but it does not work/is implemented on RestSharp's XmlDeserializer, but further testing would be required to ascertain that for sure.


Solution

  • The CDATA is essentially just escaping syntax and is handled by most readers. What you are looking for is:

    [XmlText]
    public string WhateverThisIs { get; set; }
    

    on the object that has raw content. By adding that to Provider, WhateverThisIs gets the value of "read 1". The other 2 properties already deserialize correctly as "read 2" and "read 3" without you having to do anything.

    For reference, everything here would behave almost the same without the CDATA (there are some whitespace issues):

    <Provider>
      read 1
      <Identifier>read 2</Identifier>
      <IdentificationScheme>read 3</IdentificationScheme>
    </Provider>