I'm trying to pull data out of a file in order to split it. I need the Data element, the Header, Reference and Sender that is being used for each splitted file, which will contain these elements plus one Item element per file.
<Data xmlns="http://foo.schema">
<Header>
<Contact>
<Id>MS0199</Id>
</Contact>
<ContactRef>GA</ContactRef>
</Header>
<Reference>RA0122</Reference>
<Sender>
<SubNo>19223</SubNo>
</Sender>
<Item id="TA001">
<Code>IAT-0100</Code>
</Item>
<Item id="TA054">
<Code>MFF-2910</Code>
</Item>
</Data>
I'm extracting the "Header data" (Data
, Header
, Reference
, Sender
) by this:
using (XmlReader reader = XmlReader.Create(xmlFile))
{
reader.MoveToContent();
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element && reader.Name == "Header")
{
hdrval = XElement.Load(reader.ReadSubtree());
}
if (reader.NodeType == XmlNodeType.Element && reader.Name == "Reference")
{
refval = XElement.Load(reader.ReadSubtree());
}
if (reader.NodeType == XmlNodeType.Element && reader.Name == "Sender")
{
sndval = XElement.Load(reader.ReadSubtree());
}
if (reader.NodeType == XmlNodeType.Element && reader.Name == "Item")
{
break;
}
}
}
XNamespace ns = "http://foo.schema";
XElement Header = new(ns + "Data",
hdrval,
refval,
sndval,
);
Problem is that when I look at the resulting output it now says:
<Data xmlns="http://foo.schema">
<Header xmlns="http://foo.schema">
<Contact>
<Id>MS0199</Id>
</Contact>
<ContactRef>GA</ContactRef>
</Header>
<Reference xmlns="http://foo.schema">RA0122</Reference>
<Sender xmlns="http://foo.schema">
<SubNo>19223</SubNo>
</Sender>
</Data>
Is there a way to stop it from injecting the xmlns definition on every single
child I pull out? I want it on the Data
element seeing as it's the root element
but not on every single child.
It works if I use LINQ with XElement, however the file can be multiple GB's in size so I'm trying to use the XmlReader in order to stream the file rather than loading the whole thing at once.
Your problem is that ReadSubtree()
reads the current node as if it were the root of its own well-formed XML document. Therefore it inserts a namespace attribute corresponding to the namespace of the current element unless one is there already.
If you don't want that, avoid ReadSubtree()
and call XNode.ReadFrom()
directly:
using (var reader = XmlReader.Create(xmlFile, new XmlReaderSettings { IgnoreWhitespace = true } ))
{
reader.MoveToContent();
while (!reader.EOF)
{
if (reader.NodeType == XmlNodeType.Element && reader.Name == "Header")
{
hdrval = (XElement)XNode.ReadFrom(reader);
// ReadFrom() leaves the reader positioned AFTER the end element, so no nead for an additional Read()
}
else if (reader.NodeType == XmlNodeType.Element && reader.Name == "Reference")
{
refval = (XElement)XNode.ReadFrom(reader);
}
else if (reader.NodeType == XmlNodeType.Element && reader.Name == "Sender")
{
sndval = (XElement)XNode.ReadFrom(reader);
}
else if (reader.NodeType == XmlNodeType.Element && reader.Name == "Item")
{
break; // Stop reading at the first <Item id="xxx"> node
}
else
{
reader.Read();
}
}
}
Notes:
ReadSubtree()
leaves the reader positioned on the EndElement
tag, but ReadFrom()
leaves it positioned immediately after the EndElement
tag. Thus switching from one to another requires rewriting your loop logic slightly, as unconditionally reading the next node may end up skipping some content.
When you do use ReadSubtree()
, you should take note of the following documentation remark:
You should not perform any operations on the original reader until the new reader has been closed. This action is not supported and can result in unpredictable behavior.
e.g.
using (var subReader = reader.ReadSubtree())
hdrval = XElement.Load(subReader);
I've noticed that XNode.ReadFrom(reader)
is loading the insignificant whitespace from your input XML as text nodes in your XElement
hierarchy. If you don't want that, create your XmlReader
with XmlReaderSettings.IgnoreWhitespace = true
:
Gets or sets a value indicating whether to ignore insignificant white space.
e.g.:
using (var reader = XmlReader.Create(
xmlFile,
new XmlReaderSettings { IgnoreWhitespace = true } ))
{
Demo fiddle here.