Search code examples
xmlbashxpathsedxmllint

Getting elements from xml and storing it in array using shell script


I have my xml like :

<URLS xmlns:"http://www.example.com">
    <Service>
        <forwardUrl>
            <value>http://www.example1.com:80</value>
            <value>http://www.example2.com:80</value>
            .
            .
            .
       </forwardUrl>
    </Service>
</URLS>

I want to store all the forward urls in an array.

I tried doing this :

let urlcount=$(sed -e "s/xmlns/ignore/" /tmp/in.xml | xmllint --xpath "count(//forwardUrl/value)"  -)
declare -a urls=()

for((i=1; i <= $urlcount; i++)); do
    echo $i
    urls[$i]=$(sed -e "s/xmlns/ignore/" /tmp/in.xml | xmllint --xpath '//forwardUrl/value["$i"]/text()' -)
done

But when I do echo ${urls[7]}, it prints all the values.

I want to store different urls in different indexes. Please help me with this.


Solution

  • How about something like this using only sed:

    $ cat file1
    <URLS xmlns:"http://www.example.com">
        <Service>
            <forwardUrl>
                <value>http://www.example1.com:80</value>
                <value>http://www.example2.com:80</value>
                <value>http://www.example3.com:80</value>
                <value>http://www.example4.com:80</value>
           </forwardUrl>
        </Service>
    </URLS>
    $ declare -a array=($(sed -n '/\s*<forwardUrl>/,/<\/forwardUrl>/p' file1 | sed -e 's/<[^>]*>//g' -e '/^\s*$/d' -e 's/\s*//g'))
    $ echo "${array[0]}"
    http://www.example1.com:80
    $ echo "${array[1]}"
    http://www.example2.com:80
    $ echo "${array[2]}"
    http://www.example3.com:80
    $ echo "${array[3]}"
    http://www.example4.com:80
    $ echo "${array[@]}"
    http://www.example1.com:80 http://www.example2.com:80 http://www.example3.com:80 http://www.example4.com:80
    $
    

    Expression breakdown:

    declare -a array=($(sed -n '/\s*<forwardUrl>/,/<\/forwardUrl>/p' file1 | sed -e 's/<[^>]*>//g' -e '/^\s*$/d' -e 's/\s*//g'))
    
    1. sed -n '/\s*<forwardUrl>/,/<\/forwardUrl>/p' file1 to print lines in between lines that match <forwardUrl> and at </forwardUrl>(both inclusive)
    2. sed -e 's/<[^>]*>//g' -e '/^\s*$/d' -e 's/\s*//g' 1st expression deletes all tags, 2nd deletes all empty lines(having spaces) and last expression just removes all spaces

    Edit 1:

    $ cat file1
    <URLS xmlns:"http://www.example.com">
        <Service>
            <forwardUrl>
                <value>http://www.sun.com:80</value>
                <value>http://www.example2.com:80</value>
                <value>http://www.example3.com:80</value>
                <value>http://www.example4.com:80</value>
           </forwardUrl>
        </Service>
    </URLS>
    $ declare -a array=($(sed -n '/\s*<forwardUrl>/,/<\/forwardUrl>/p' file1 | sed -e 's/<[^>]*>//g' -e '/^\s*$/d' -e 's/\s*//g'))
    $ echo "${array[0]}"
    http://www.sun.com:80
    $ echo "${array[@]}"
    http://www.sun.com:80 http://www.example2.com:80 http://www.example3.com:80 http://www.example4.com:80
    $