Search code examples
xmlbashxmlstarlet

iterate over xml with xmlstarlet and output parent and child node values


I have this format in multiple XML files:

<bad>
 <objdesc>
 <desc id="butwba10.1.wc.01" dbi="BUTWBA10.1.1.WC">
        <physdesc>adfa;sdfkjad</physdesc>
        <related objectid="bb435.1.comdes.02"/>
        <related objectid="but614r.1.penc.01"/>
        <related objectid="but611.1.wc.01"/>
        <related objectid="but612.1.wd.01"/>
        <related objectid="bb515.1.comb.12"/>
 </desc>
 <desc id="butwba10.1.wc.02" dbi="BUTWBA10.1.2.WC">
        <physdesc>alkdjfa;sfjsdf</physdesc>
        <related objectid="but621r.1.penc.01"/>
        <related objectid="bb435.1.comdes.03"/>
 </desc>
 </objdesc>
 </bad>

I want output that looks like this:

butwba10.1.wc.01 dbi="BUTWBA10.1.1.WC" related="bb435.1.comdes.02, but614r.1.penc.01, but611.1.wc.01, but612.1.wd.01, bb515.1.comb.12"

butwba10.1.wc.02 dbi="BUTWBA10.1.2.WC" related="but621r.1.penc.01, bb435.1.comdes.03"  

I have a bash script that uses xmlstarlet to iterate over the xml files in a directory, but it dumps all the "related values" after the last desc id. It needs to associate each desc id with each set of "related" values. And it needs to include the dbi value for each id.

#!/bin/bash

for x in *.xml
do
    id=$(xml sel -t -v '//bad/objdesc/desc/@id' "$x")
    arr=( $(xml sel -t -v '//bad/objdesc/desc/related/@objectid' "$x") )
    cat<<EOF >> new_file
$id related="$(perl -e 'print join ",", @ARGV' "${arr[@]}")"
EOF
done

Solution

  • #!/bin/bash
    
    for x in *.xml; do
      count=$(xml sel -t -v 'count(//bad/objdesc/desc/@id)' "$x")
      for ((i=1; i<=count; i++)); do
        id=$(xml sel -t -v "//bad/objdesc/desc[$i]/@id" "$x")
        arr=( $(xml sel -t -v "//bad/objdesc/desc[$i]/related/@objectid" "$x") )
        cat<<EOF
    $id related="$(perl -e 'print join ",", @ARGV' "${arr[@]}")"
    EOF
      done
    done
    

    =)

    It seems like this is a job for XSLT. But, OK, shell can handle this too...

    Can you do the rest for dbi ? It's better to try understanding what involves here than just cut/paste.