I am attempting to process a large XML document (using a XmlReader
) in a single pass, and deserialize only certain elements in it using an XmlSerializer
.
Below is some code and a tiny mock XML document showing how I have attempted to do this.
Rationale for using
XmlReader
: 1. I am dealing with very large XML documents (10–250 MB), which for this reason I do not want to load into memory. SoXmlDocument
is out of the question. 2. I want to extract only certain elements. Typically I will be able to ignore most other content.XmlReader
appears to give me an efficient means of skipping irrelevant content. 3. I do not know in advance whether any and all elements that I can deal with will be present; therefore I am not using a bunch ofXpath
/XQuery
or LINQ to XML-based queries, because I want to make only a single pass over the XML files (due to their size).
public class ElementOfInterest { }
…
var xml = @"<?xml version='1.0' encoding='utf-8' ?>
<Root xmlns:ex='urn:stakx:example'
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
<ElementOfInterest xsi:type='ex:ElementOfInterest' />
</Root>";
var reader = System.Xml.XmlReader.Create(new System.IO.StringReader(xml));
reader.ReadToFollowing("ElementOfInterest");
var serializer = new System.Xml.Serialization.XmlSerializer(typeof(ElementOfInterest));
serializer.Deserialize(reader.ReadSubtree());
The last line of code throws the following inner exception:
InvalidOperationException
: "Namespace prefixex
is not defined."
Obviously, the XmlSerializer
doesn't recognise the ex
namespace prefix inside the xsi:type
attribute's value.
This is just one error I am having, but frankly, the larger problem is that I have no idea how to go about the whole namespace issue. I am simply looking for a convenient way to de-serialize just a single node out of the XML document, but that seems to entail having to manually register/manage namespaces, and to somehow forward them from the XmlReader
to the XmlSerializer
.
Can someone demonstrate how to deserialize a single node from a XML document read with an XmlReader
, either by pointing out the error in my code, or by showing an alternative approach?
The following works:
using System.IO;
using System.Xml;
using System.Xml.Serialization;
static void Main()
{
var xml = @"<?xml version='1.0' encoding='utf-8' ?>
<Root
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xmlns:ex='urn:stakx:example'
>
<ex:ElementOfInterest xsi:type='ex:ElementOfInterest' />
</Root>";
var nt = new NameTable();
var mgr = new XmlNamespaceManager(nt);
mgr.AddNamespace("ex", "urn:stakx:example");
var ctxt = new XmlParserContext(nt, mgr, "", XmlSpace.Default);
var reader = XmlReader.Create(new StringReader(xml), null, ctxt);
var serializer = new XmlSerializer(typeof(ElementOfInterest));
reader.ReadToFollowing("ElementOfInterest", "urn:stakx:example");
var eoi = (ElementOfInterest)serializer.Deserialize(reader.ReadSubtree());
}
[XmlRoot(Namespace = "urn:stakx:example")]
public class ElementOfInterest { }
Note the namespace in the input: <ex:ElementOfInterest>
.