I would like to implement code that deserializes an xml into a list of objects. I found a problem in the code where the while reads forward so every other node is skipped. What is the proper way to check for a next node in an xml to be implemented in the while loop of this code?
private Task<List<TAxEntity>> Deserialize(XmlReader reader)
{
var deserializer = new XmlSerializer(typeof(TAxEntity));
var entities = new List<TAxEntity>();
do
{
using (var stringReader = new StringReader(reader.ReadOuterXml()))
{
var entity = (TAxEntity)deserializer.Deserialize(stringReader);
entities.Add(entity);
}
}
while (reader.ReadToNextSibling(EntityElementName));
return Task.FromResult(entities);
}
To check that an XmlReader
is already correctly positioned, you can check whether reader.NodeType == XmlNodeType.Element
and reader.Name == EntityElementName
. Then, if the reader is already correctly positioned, do not scan forward using ReadToNextSibling()
.
However, there are a few improvements to be made to your algorithm:
Instead of checking for the correct reader.Name
, check whether the LocalName
and NamespaceURI
are as expected, and if not, call reader.ReadToNextSibling(string localName,string namespaceURI)
. This avoids hardcoding of namespace prefixes, which is a bug to be avoided.
Rather than ReadOuterXml()
, call reader.ReadSubtree()
and pass the returned reader directly to deserializer.Deserialize()
. Your current algorithm parses the XML, reformats it into a second XML string, then parses that string a second time. Using ReadSubtree()
allows the XmlSerializer
to stream a nested element directly from the incoming XmlReader
and so avoids this extra parsing and reformatting.
Putting all this together, you can introduce the following lower-level extension method:
public static class XmlReaderExtensions
{
public static IEnumerable<TElement> DeserializeSequence<TElement>(this XmlReader reader, string localEntityElementName, string namespaceURI)
{
if (reader == null)
throw new ArgumentNullException();
var deserializer = new XmlSerializer(typeof(TElement));
while ((reader.NodeType == XmlNodeType.Element && reader.LocalName == localEntityElementName && reader.NamespaceURI == namespaceURI)
|| reader.ReadToNextSibling(localEntityElementName, namespaceURI))
{
// Using ReadSubtree instead of ReadOuterXml() avoids having do parse, reformat, then parse the formatted XML a second time
// by reading directly from the current stream only once.
TElement element;
using (var subReader = reader.ReadSubtree())
{
element = (TElement)deserializer.Deserialize(subReader);
}
// Consume the EndElement also (or move past the current element if reader.IsEmptyElement).
reader.Read();
yield return element;
}
}
}
And modify your Deserialize()
method to be as follows:
private Task<List<TAxEntity>> Deserialize(XmlReader reader)
{
var entities = reader.DeserializeSequence<TAxEntity>(EntityElementName, "" /* Pass the correct namespace here */).ToList();
return Task.FromResult(entities);
}
Sample .Net fiddle.
Note that any manual XmlReader
code should be unit-tested with both indented and unindented XML, since bugs that involve skipping nodes are sometimes masked when parsing indented XML (because the whitespace nodes get skipped.)