I have this format in multiple XML files:
<bad>
<objdesc>
<desc id="butwba10.1.wc.01" dbi="BUTWBA10.1.1.WC">
<physdesc>adfa;sdfkjad</physdesc>
<related objectid="bb435.1.comdes.02"/>
<related objectid="but614r.1.penc.01"/>
<related objectid="but611.1.wc.01"/>
<related objectid="but612.1.wd.01"/>
<related objectid="bb515.1.comb.12"/>
</desc>
<desc id="butwba10.1.wc.02" dbi="BUTWBA10.1.2.WC">
<physdesc>alkdjfa;sfjsdf</physdesc>
<related objectid="but621r.1.penc.01"/>
<related objectid="bb435.1.comdes.03"/>
</desc>
</objdesc>
</bad>
I want output that looks like this:
butwba10.1.wc.01 dbi="BUTWBA10.1.1.WC" related="bb435.1.comdes.02, but614r.1.penc.01, but611.1.wc.01, but612.1.wd.01, bb515.1.comb.12"
butwba10.1.wc.02 dbi="BUTWBA10.1.2.WC" related="but621r.1.penc.01, bb435.1.comdes.03"
I have a bash script that uses xmlstarlet to iterate over the xml files in a directory, but it dumps all the "related values" after the last desc id. It needs to associate each desc id with each set of "related" values. And it needs to include the dbi value for each id.
#!/bin/bash
for x in *.xml
do
id=$(xml sel -t -v '//bad/objdesc/desc/@id' "$x")
arr=( $(xml sel -t -v '//bad/objdesc/desc/related/@objectid' "$x") )
cat<<EOF >> new_file
$id related="$(perl -e 'print join ",", @ARGV' "${arr[@]}")"
EOF
done
#!/bin/bash
for x in *.xml; do
count=$(xml sel -t -v 'count(//bad/objdesc/desc/@id)' "$x")
for ((i=1; i<=count; i++)); do
id=$(xml sel -t -v "//bad/objdesc/desc[$i]/@id" "$x")
arr=( $(xml sel -t -v "//bad/objdesc/desc[$i]/related/@objectid" "$x") )
cat<<EOF
$id related="$(perl -e 'print join ",", @ARGV' "${arr[@]}")"
EOF
done
done
=)
It seems like this is a job for XSLT. But, OK, shell can handle this too...
Can you do the rest for dbi
? It's better to try understanding what involves here than just cut/paste.