Search code examples
xmlbashtokenifs

String tokenisation algorithm won't tokenise


Morning all, I am writing a bash script to extract the values of certain XML tags from all files in a given directory. I have decided to do this by tokenising each line and returning th4e relavent token. The problem is that it isn't tokenising correctly and I can't quite work out why. Here is the smallest example that I could make that reconstructs the issue

#!/bin/bash
for file in `ls $MY_DIRECTORY`
do
    for line in `cat $MY_DIRECTORY/$file`
    do
        LOCALIFS=$IFS
        IFS=<>\"

        TOKENS=( $line )
        IFS=$LOCALIFS
        echo "Token 0: ${TOKENS[0]}" 
        echo "Token 1: ${TOKENS[1]}" 
        echo "Token 2: ${TOKENS[2]}" 
        echo "Token 3: ${TOKENS[3]}" 

    done
 done

I'm guessing the issue is to do with my fiddling with IFS inside a loop which itself uses IFS (i.e. the cat operation), but this has never been a problem before.
Any ideas?

Thanks, Rik


Solution

  • Use a better tool to parse xml, ideally it should be a parser, but if your requirement is simple and you know how your xml is structured, simple string manipulation might suffice. For example, xml file and you want to get value of tag3

    $  cat file
    blah
    <tag1>value1 </tag1>
    <tag2>value2 </tag2>
    <tag3>value3
    </tag3>
    blah
    
    $ awk -vRS="</tag3>" '/tag2/{ gsub(/.*tag3>/,"");print}' file
    value3
    

    so to iterate over your directory

    for file in *.xml
    do
      value="$(awk -vRS="</tag3>" '/tag2/{ gsub(/.*tag3>/,"");print}' "$file" )"
      echo "$value"
    done