Search code examples
xmlbashsedawkcygwin

Should I parse this XML with BASH?


I'm trying to get from this xml example

<String Name="descResist">
    <Description><![CDATA["resist_type_chimney"]]></Description>
    <Flags>
        <ParFlg_Child/>
    </Flags>
    <Value><![CDATA["90_min."]]></Value>
</String>

this

descResist;resist_type_chimney 
descResist;90_min.

So, basically I need to extract the CDATA content and concat it with the value of Name.

One of problems is, that it isn't always in tag String... could be also Integer, Title, Boolean, etc...

I tried this

$ grep -o "Name=\".*\"\|<\!\[CDATA\[.*\]\]>" file.xml | sed 's/<\!\[CDATA\[\"\(.* \)\"\]\]>/\1/'

which gives me

Name="descResist"
resist_type_chimney
90_min.

How can I prefix the next lines with value of Name string?

Like in

Name="descResist"
resist_type_chimney
90_min.
Name="anotherName"
foo_bar
Name="anoooother"
Name="notempty"
bar_foo

it gets a little complicated.

It's also good to work with XML like this? There also should be any nested <tagType Name=... so I guess this shouldn't be problem.

EDIT: I'm working on cygwin a looking for bash/sed/awk simple solution.


Solution

  • Try this out:

    #!/bin/bash
    
    Name="InvalidName"
    while read line; do
            case "$line" in
                    Name=*) eval "$line" ;; # assuming $line is always bash-friendly Name="Value"
                    *) echo "$Name;$line" ;;
            esac
    done < <(egrep -o 'Name=".*"|<!\[CDATA\[.*?\]\]>' file.xml | sed -r 's/<!\[CDATA\["(.*)"\]\]>/\1/')
    

    I've changed your command slightly to use extended regular expressions (that's why it's "egrep" and "sed -r") so it's a bit easier to read.

    I don't like that eval I've used, but "export -n" does something strange for this case, and the code would get needlessly complex just to avoid the eval.

    It's OK to "parse" XML in Bash if you're really really sure the text structure will not change. As soon as somebody decides to "optimize" the XML by collapsing it all into a single line, you're a bit toast.

    EDIT

    Here's a script without the ugly eval:

    #!/bin/bash
    
    Name="InvalidName"
    while read line; do
            case "$line" in
                    Name=*) export -n "$line" ;; # assuming $line is always bash-friendly Name=Value
                    *) echo "$Name;$line" ;;
            esac
    done < <(egrep -o 'Name=".*"|<!\[CDATA\[.*?\]\]>' file.xml | sed -r 's/<!\[CDATA\["(.*?)"\]\]>/\1/; s/Name="(.*)"/Name=\1/')