Search code examples
xmlstarlet

How to use XMLStarlet to lookup values in a second file


Let's assume we have two directories:

/home/a
/home/b

In directory a with have lots of XML files like this:

<root>
  <id>87182378127381273</id>
  <name>just a name</name>
</root>

and for each of the id we find an XML file in directory b, like:

/home/b/87182378127381273.xml
...

and in that file we have for instance:

<root>
  <counter1>879</counter1>
</root>

And now I just want to run an xmlstarlet command that outputs the following for each found XML in directory a:

87182378127381273,just a name,879
...

I tried to solve this by this xmlstarlet command:

find . -iname '*.xml' | xargs xmlstarlet sel \
  -t -m "/root" -i "./id" -v "./id" -o -v "./name" -b -n | grep -v ^$

Now I wanted to use the --var option and load the second XML by constructing the file path with the value of id and output the timestamp value, but I don't know how. Any idea?


Solution

  • Answer rewritten after question was rephrased.

    This should do it:

    # shellcheck  shell=sh  disable=SC2016
    find '/home/a' -type f -iname '*.xml' -exec xmlstarlet select --text \
      -t -m 'root[string(id)]' \
           --var bpath='concat("/home/b/",id,".xml")' \
           --var ct='document($bpath)/root/counter1' \
           -v 'concat(id,",",name,",",$ct)' -n \
      {} +
    
    • why not skip xargs when find can invoke a command with as many filenames ({} +) as the command line can hold and repeat as needed
    • use an XPath predicate to match a root element with a non-empty id child element, no output is generated if root[string(id)] isn't matched
    • use the XSLT document function to look up a value in an external XML file
      • if using a relative pathname with document() it must be relative to the directory in which xmlstarlet is invoked (i.e. the current directory)
      • if the target file is inaccessible xmlstarlet will issue a failed to load external entity "…" error message
    • use the XPath concat function to stringify one record
    • don't forget select's -T (aka --text) option for plaintext output