Search code examples
xmlparsingxmlstarlet

xmlstarlet: Querying and concatenating nested child elements with a single query


i have an XML-file like:

    <?xml version="1.0" encoding="utf-8"?>
    <project>
    <data>
        <modelType type="InstantMessage">
            <model type="InstantMessage" id="1" >
                <modelField name="From" type="Party">
                    <model type="Party" id="123456">
                        <field name="Identifier" type="String">
                            <value type="String">foo</value>
                        </field>
                    </model>
                </modelField>
                <multiModelField name="To" type="Party" />
                    <field name="Body" type="String">
                        <value type="String">bar</value>
                    </field>
                    <field name="TimeStamp" type="TimeStamp">
                        <value type="TimeStamp">2016-07-11 13:26:38+02:00</value>
                    </field>
            </model>
        </modelType>
    </data>
    </project>

I do want to produce following result with a single query:

foo|bar

I do not know how to access these fields when nested in different levels. I tried something like:

root@machine:/.../# xmlstarlet sel -T -t -m /project/data/modelType/model -v "concat(/modelField/model/field/value'|'/field[@Body]/value)" file.xml

but I permanently got syntax errors by xmlstarlet. I do not understand how to use it from the manual. Does anyone know how to use xmlstarlet in this case?

Thanks, Peter


Solution

  • Your XML file (as presented) is missing a close tag for <project>; that will cause a parsing error which will prevent xmlstarlet from being able to execute the query.

    The query itself has a few problems: in

    1. The syntax for the concat function is concat(a,b,c); your invocation leaves out the commas.

    2. Inside a match, xpaths are relative to the matched node. But the first element in the concat:

      /modelField/model/field/value
      

      is absolute, so it can only match from the root, which it doesn't. You need a relative expression:

      modelField/model/field/value
      

      or

      ./modelField/model/field/value
      

      And the last xpath:

      /field[@Body]/value
      

      won't be found because field is not the root element, and without the / it won't match either, because field is not an immediate child of the matched node. Here you could either spell out the path from the matched node, as above, or use // to select any child:

      .//field[@Body]/value
      
    3. However, the specifier [@Body] is incorrect. As written, the selector succeeds if the element has an attribute named Body. You are trying to match an element with an attribute named name whose value is Body, which you would write as [@name="Body"]. The quotes are mandatory, which means that you need to use single quotes around the expression or backslash-escape the quotes.

    Putting that all together, once you fix the XML file you could use:

    xmlstarlet sel -T \
      -t -m /project/data/modelType/model \
         -v 'concat(modelField/model/field/value,"|",.//field[@name="Body"]/value)' \
      file.xml
    

    The concat call is not really necessary since you can use several -v options, and -o to output a fixed string. You might find the following more readable:

    xmlstarlet sel -T \
      -t -m '/project/data/modelType/model' \
         -v './/field[@name="Identifier"]/value' \
         -o '|' \
         -v './/field[@name="Body"]/value' \
      file.xml