Search code examples
sedawkhp-ux

search and print the value inside tags using script


I have a file like this. abc.txt

<ra><r>12.34</r><e>235</e><a>34.908</a><r>23</r><a>234.09</a><p>234</p><a>23</a></ra>
<hello>sadfaf</hello>
<hi>hiisadf</hi>
<ra><s>asdf</s><qw>345</qw><a>345</a><po>234</po><a>345</a></ra>

What I have to do is I have to find <ra> tag and for inside <ra> tag there is <a> tag whose valeus I have to store the values inside of into some variables which I need to process further. How should I do this.?

values inside tag within tag are:
34.908,234.09,23
345,345


Solution

  • This awk should do:

    cat file
    <ra><r>12.34</r><e>235</e><a>34.908</a><r>23</r><a>234.09</a><p>234</p><a>23</a></ra><a>12344</a><ra><e>45</e><a>666</a></ra>
    <hello>sadfaf</hello>
    <hi>no print from this line</hi><a>256</a>
    <ra><s>asdf</s><qw>345</qw><a>345</a><po>234</po><a>345</a></ra>
    

    awk -v RS="<" -F">" '/^ra/,/\/ra/ {if (/^a>/) print $2}' file
    34.908
    234.09
    23
    666
    345
    345
    

    It take in care if there are multiple <ra>...</ra> groups in one line.


    A small variation:

    awk -v RS=\< -F\> '/\/ra/ {f=0} f&&/^a/ {print $2} /^ra/ {f=1}' file
    34.908
    234.09
    23
    666
    345
    345
    

    How does it work:

    awk -v RS="<" -F">" '   # This sets record separator to < and gives a new line for every <
    /^ra/,/\/ra/ {          # within the record starting witn "ra" to record ending with "/ra" do
        if (/^a>/)          # if line starts with an "a" do
        print $2}'          # print filed 2
    

    To see how changing RS works try:

    awk -v RS="<" '$1=$1' file
    ra>
    r>12.34
    /r>
    e>235
    /e>
    a>34.908
    /a>
    r>23
    /r>
    a>234.09
    /a>
    p>234
    ...
    

    To store it in an variable you can do as BMW suggested:

    var=$(awk ...)
    var=$(awk -v RS=\< -F\> '/\/ra/ {f=0} f&&/^a/ {print $2} /^ra/ {f=1}' file)
    echo $var
    34.908 234.09 23 666 345 345
    echo "$var"
    34.908
    234.09
    23
    666
    345
    345
    

    Since its many values, you can use an array:

    array=($(awk -v RS=\< -F\> '/\/ra/ {f=0} f&&/^a/ {print $2} /^ra/ {f=1}' file))
    echo ${array[2]}
    23
    echo ${var2[0]}
    34.908
    echo ${var2[*]}
    34.908 234.09 23 666 345 345