I'm trying to carve out sections from hundreds of XML files. The structure of the XML docs is similar to:
<document>
<nodes>
<node id=123>pages of txt</node>
<node id-=124>more example pages of txt and sub elements</node>
</nodes></document>
I'm just trying to extract all <node>
elements. I have been trying to use xmlstarlet:
xmlstarlet sel -t -c “/document/nodes”
The problem is that it only returns </nodes>
.
I just need to extract the following examples:
<node id=123>pages of txt</node>
<node id-=124>more example pages of txt and sub elements</node>
Can anyone recommend a better option, tool or approach? Many thanks.
You just have your xpath wrong:
xmlstarlet sel -t -c '//node'
Also, valid XML required all attribute values to be quoted
<document>
<nodes>
<node id="123">pages of txt</node>
<node id="124">more example pages of txt and sub elements</node>
</nodes></document>
I've found this page gives lots of useful xpath examples: http://msdn.microsoft.com/en-us/library/ms256086(v=vs.110).aspx