Search code examples
bashxmllint

xmlllint to parse a flie


need help to parse and convert values to store in csv.

See below the sample xml.

<list type='full' level='state' val='WI'>
<ac val='262'>
<ph val='0000000' />
<ph val='0003639' />
<ph val='0129292' />
</ac>
<ac val='363'>
<ph val='0000000' />
<ph val='0003639' />
</ac>
</list>

I need output to be like

262, '0000000'
262, '0003639'
262, '0129292'
363, '0000000'
363, '0003639'

I tried to loop through the file & entries but problem is we dont know how many phones we are getting against each ac (area codes) so the phone extraction loop (j) is a problem.

for i in {1..2}; do
    for j in {1..3}; do
        echo "i=$i, j=$j"
        xmllint  --xpath "concat(//ac[$i]/@val,',', //ac/ph[$j]/@val)" test.xml
    done
done

Can we do it in some simple way using xmllint?

Thanks


Solution

  • Here's one way using xmlstarlet iterating over /list/ac/ph, then concatenating the parent node's ../@val with the current node's @val attribute values

    xmlstarlet sel -t -m '/list/ac/ph' -v 'concat(../@val, ", ", @val)' --nl file.xml
    
    262, 0000000
    262, 0003639
    262, 0129292
    363, 0000000
    363, 0003639
    

    Here's another one using kislyuk/yq that has a built-in CSV generator with proper escaping:

    xq -r '.list.ac[] | [."@val" | tonumber] + (.ph[] | [."@val"]) | @csv' file.xml
    
    262,"0000000"
    262,"0003639"
    262,"0129292"
    363,"0000000"
    363,"0003639"