Search code examples
xmlxmlstarlet

extract xml nodes as table with xmlstarlet


I have a pretty simple but huge xml (more than 50.000 lines). I want to extract just one repeated node as table (ring). Take into account this is an example. The real file has more than 300 elements inside "ring" node, so I try to avoid to write command for extraction element by element if that is possible.

Here a little example

<xml>
<root>
  <ration>
    <ring>
      <id  value="1"/>
      <date  value="2021-01-01"/>
      <price  value="435"/>
    </ring>
  </ration>
</root>
<root>
  <ration>
    <ring>
      <id  value="14"/>
      <date  value="2021-02-01"/>
      <price  value="745"/>
    </ring>
  </ration>
</root>
</xml>

what I want to achieve is: this xmlstarlet command is working and I get the proper result but now think on doing it 300 times (for every element I have to insert the call into the concat part). I need a more simple approach.

xmlstarlet sel -T -t -m '//root/ration/ring' -v "concat(id/@value,',',date/@value,',',price/@value)"  -n file.xml

id, date, price
1, 2021-01-01, 435
14, 2021-02-01, 745

Solution

  • It's been a while since you asked. Nevertheless...

    xmlstarlet sel -T \
        -t -m '//root[1]/ration/ring/*[boolean(@value)]' \
        -v 'substring(", ", 1, 2*(position() != 1))' -v 'local-name()' -b -nl \
        -t -m '//root' -m 'ration/ring/*[boolean(@value)]' \
        -v 'substring(", ", 1, 2*(position() != 1))' -v '@value' -b -nl \
        file.xml
    

    Output:

    id, date, price
    1, 2021-01-01, 435
    14, 2021-02-01, 745
    

    Template #1 emits the header, #2 the data. -m expressions #1 and #3 match child elements of ring having a @value attribute. Using -m and -b together does the trick. If @value is replaced with @* all attributes are selected; with a predicate such as @*[name()="value" or name()="otherval"] selection can be fine-tuned.

    Separation of values takes more work in XSLT 1.0 than in 2.0: in this case a substring function call returns an empty string for the first value where position() != 1 is false (converted to number 0), and a 2-character string (comma and space) for the following where it is true (1).

    By the way, tag / PI names beginning with xml in any lower/upper combination are reserved.