Search code examples
xmlbashparsingxml-parsingxmllint

Bash script and xml/rss parsing


i'm writing a small script that parse an rss using xmllint.

Now i fetch the titles list with the following command:

ITEMS=`echo "cat //title" | xmllint --shell rss.xml `
echo $ITEMS > tmpfile

But it returns:

<title>xxx</title> ------- <title>yyy :)</title> ------- <title>zzzzzz</title>

without newlines, or space. Now i'm interested only in the text content of title tags, and if possible i want to navigate through the titles using a for/while loop, something like:

for  val in $ITEMS 
do
       echo $val
done

How it can be done? Thanks in advance


Solution

  • I had the same type of requirement at some point to parse xml in bash. I ended up using xmlstarlet http://xmlstar.sourceforge.net/ which you might be able to install.

    If not, something like that will remove the surounding tags:

    echo "cat  //title/text()" | xmllint --shell  rss.xml
    

    Then you will need to cleanup the output after piping it, a basic solution would be:

    echo "cat  //title/text()" | xmllint --shell  rss.xml  | egrep '^\w'
    

    Hope this helps