Search code examples
cnestedlibxml2xmltextwriter

Nest xmlDoc into existing xmlTextWriter


I think I'm missing something trivial but I'm losing a bunch of time on this, so its solution may be useful to others too:

I'm working with libxml2 2.9.8 (pure C, not C++ bindings) under linux. I have an external (non-libxml) tree structure representing an XML file and I'm trying to write into a string representation using libxml2. All is trivial and working nice traversing it and writing using xmlTextWriter API (it is a struct with simple attributes, like

 typedef struct _simplifiedNode {
    char *tag,
    char *content,
    struct _simplifiedNode *parent,
    struct _simplifiedNodeList *children,
 } simplifiedNode;

), except at a certain point I encounter a string node that may contain the string representation of an xml document. I can parse it using the xmlReadMemory API, but then I need to nest it (and not its escaped string representation) into the on-going writer, including namespaces and attributes.

Is there a trivial way I am missing to do this recursively having the parsed doc/root element, without introspecting every sub-element?

e.g.

I'm producing the following document using xmlTextWriter API

<Title>
    TitleValue
</Title>
<Date>
    2018-11-26
</Date>
<Content>

The Content node in the non-libxml tree is a leaf node with tag Content containing a string like

char *content = "<SomeXmlComplexDocument ss:someattr=\"attrval\">Somecontent</SomeXmlComplexDocument>"

What I Want to achieve is, instead of having something like

<Content>&lt;SomeXmlComplexDocument&gt; ... </Content>

after having parsed and validated the content with xmlReadMemory to re-inject the document obtaining

<Content>
    <SomeXmlComplexDocument ss:someattr="attrval">Somecontent</SomeXmlComplexDocument>
</Content>

namespaces and attributes should be preserved.


Solution

  • To serialize the inner XML fragments unescaped, you can simply use xmlTextWriterWriteRaw. This won't check whether the XML is well-formed, though. If you need validation, you'll have to parse the XML fragments at some point. Depending on the content model, you might have to use xmlParseBalancedChunkMemory instead of xmlReadMemory. It should also be possible to parse the result document in one go after it was written, but you'll lose information like original line numbers.