Search code examples
xmlxmlnodeinnerxmlouterxml

XmlNode InnerXml vs OuterXml


I've come across a bizarre situation and I'm hoping that someone who understands better than I do can help me to resolve it.

I'm inserting an image into an Xml document such that it can be opened with Microsoft Word. As part of this, I need to add an Xml 'Relationship' which maps to the element containing the image. Straightforward enough.

I'm adding the node that should look like this:

<Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.png"  />

However, in the final .doc file, the same line appears as this:

<Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.png" xmlns="" />

i.e. it now has an empty xmlns="" attribute.

This is sufficient for Word to believe that the document is corrupt and refuse to open. If I manually open the file and delete that attribute, the file opens.

Clearly, I want to remove it programmatically :-) So I found the parent node. This is where my understanding is a little hazy. I believed that the OuterXml element contains the node & the contents of all it's children, while the InnerXml simply contains the children.

Here's what I'm seeing (note that the escape characters are because I've cut from the text viewer in Visual Studio).

OuterXml:

"<Relationships xmlns=\"http://schemas.openxmlformats.org/package/2006/relationships\">
<Relationship Id=\"rId3\"     Type=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettings\" Target=\"webSettings.xml\" />
 <Relationship Id=\"rId2\" Type=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings\" Target=\"settings.xml\" />
<Relationship Id=\"rId1\" Type=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles\" Target=\"styles.xml\" />
<Relationship Id=\"rId5\" Type=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme\" Target=\"theme/theme1.xml\" />
<Relationship Id=\"rId4\" Type=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable\" Target=\"fontTable.xml\" />
<Relationship Id=\"rId6\" Type=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships/image\" Target=\"media/image1.png\" xmlns=\"\" />

</Relationships>"

InnerXml:

"<Relationship Id=\"rId3\" Type=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettings\" Target=\"webSettings.xml\" xmlns=\"http://schemas.openxmlformats.org/package/2006/relationships\" />
<Relationship Id=\"rId2\" Type=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings\" Target=\"settings.xml\" xmlns=\"http://schemas.openxmlformats.org/package/2006/relationships\" />
<Relationship Id=\"rId1\" Type=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles\" Target=\"styles.xml\" xmlns=\"http://schemas.openxmlformats.org/package/2006/relationships\" />
<Relationship Id=\"rId5\" Type=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme\" Target=\"theme/theme1.xml\" xmlns=\"http://schemas.openxmlformats.org/package/2006/relationships\" />
<Relationship Id=\"rId4\" Type=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable\" Target=\"fontTable.xml\" xmlns=\"http://schemas.openxmlformats.org/package/2006/relationships\" />
<Relationship Id=\"rId6\" Type=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships/image\" Target=\"media/image1.png\" />"

Note how the 6th and final element has the erroneous xmlns="" in the OuterXml, but not within the InnerXml. I can easily change the InnerXml, but not the OuterXml.

So, my ultimate question is "how do I get rid of this added attribute?", but I'm also hoping someone can explain why there is a difference between the Xml of the inner and outer (aside from the container).


Solution

  • How are you adding the node to your document? It looks like this is happening because that element has no namespace (unlike the other elements which have a namespace of "http://schemas.openxmlformats.org/package/2006/relationships"). Keep in mind that namespaces aren't like "normal" attributes and are essential to the "identity" of a tag.

    In the "OuterXml" example, the first 5 Relationship nodes all have the same namespace as the parent element so it doesn't need to be explicitly defined. The 6th node has no namespace, hence xmlns=""

    In the "InnerXml" example, the first 5 nodes all have the same namespace but with no parent to inherit from, they each define it explicitly. The 6th node still has the blank namespace.

    In summary: the document isn't corrupt because of the string 'xmlns=""', it is corrupt because a Relationship element must have a namespace of "http://schemas.openxmlformats.org/package/2006/relationships".

    To better illustrate, here's a sample xml document.

    <root xmlns="urn:foo:bar" xmlns:ns1="urn:baz">
        <item />
        <ns1:item />
        <item xmlns="" />
    </root>
    
    • The namespace of the root element is "urn:foo:bar"
    • The namespace of the 1st item element is "urn:foo:bar"
    • The namespace of the 2nd item element is "urn:baz"
    • The namespace of the 3rd item element is ""

    If you were to get the "inner xml" of the root tag it might look something like this:

    <item xmlns="urn:foo:bar" />
    <item xmlns="urn:baz" />
    <item xmlns="" />
    

    As mentioned above, the namespace is an integral part of a tag's "identity" or whatever you want to call it. The following documents are all functionally identical:

    <foo:root xmlns:foo="urn:foo" xmlns:bar="urn:bar">
        <foo:element />
        <bar:element />
    </foo:root>
    
    <root xmlns="urn:foo" xmlns:bar="urn:bar">
        <element />
        <bar:element />
    </root>
    
    <root xmlns="urn:foo">
        <element />
        <element xmlns="urn:bar" />
    </root>