Extract field from xml file

xml file:

<head>
  <head2>
    <dict type="abc" file="/path/to/file1"></dict>
    <dict type="xyz" file="/path/to/file2"></dict>
  </head2>
</head>

I need to extract the list of files from this. So the output would be

/path/to/file1
/path/to/file2

So far, I've managed to the following.

grep "<dict*file=" /path/to/xml.file | awk '{print $3}' | awk -F= '{print $NF}'

Solution

quick and dirty based on your sample, not xml possibilties

# sed a bit secure
sed -e '/<head>/,/<\/head>/!d' -e '/.*[[:blank:]]file="\([^"]*\)".*/!d' -e 's//\1/' YourFile

# sed in brute force
sed -n 's/.*[[:blank:]]file="\([^"]*\)".*/\1/p' -e 's//\1/' YourFile



# awk quick unsecure using your sample
awk -F 'file="|">' '/<head>/{h=1} /\/head>{h=0} h && /[[:blank:]]file/ { print $2 }' YourFile

now, i don't promote this kind of extraction on XML unless your really know how is your source in format and content (extra field, escaped quote, content of string like tag format, ...) are a big cause of failure and unexpected result and no more appropriate tools are available

now to use your own script

#grep "<dict*file=" /path/to/xml.file | awk '{print $3}' | awk -F= '{print $NF}'
awk '! /<dict.*file=/ {next} {$0=$3;FS="\"";$0=$0;print $2;FS=OFS}' YourFile

no need of a grep with awk, use starting pattern filter /<dict.*file/
second awk for using a different separator (FS) could be done inside the same script changing FS but because it only occur at next evaluation (next line by default), you could force a reevaluation of current content with $0=$0 in this case