Search code examples
xmlbashxmlstarlet

Why doesn't my XMLStarlet query work when it includes quotes?


I have this XML

<Result>
<Dataset name='ident1'>
 <Row name='a1'>
    <queryname>cat0</queryname>
    <superfilename>cat1</superfilename>
    <indexfilename>cat2</indexfilename>
</Row>
 <Row name='a2'>
    <queryname>cat3</queryname>
    <superfilename>cat4</superfilename>
    <indexfilename>cat5</indexfilename>
 </Row>
 <Row name='a3'>
    <queryname>cat6</queryname>
    <superfilename>cat7</superfilename>
    <indexfilename>cat8</indexfilename>
 </Row>
</Dataset>
<Dataset name='Result 2'>
</Dataset>
<Dataset name='Result 3'>
</Dataset>
<Dataset name='Result 4'>
</Dataset>
</Result>

I want to count the number of rows of DataSet named ident1. The xmlstarlet command I am using is:

xmlstarlet sel -t -v 'count(/Result/Dataset[@name='ident1']/Row)' oscar.xml

I think it should work but it is returning 0.

I have tried other variations but all of them return 0.

xmlstarlet sel -t -v 'count(/Result/Dataset[@name='ident1'])' oscar.xml
xmlstarlet sel -t -v 'count(/Result/Dataset[@name='ident1'][*]/Row)' oscar.xml
xmlstarlet sel -t -v 'count(/Result/Dataset[@name='ident1']/Row[*])' oscar.xml

What am I doing wrong?

NOTE

If I count other element like DataSet it returns correctly 4.

xmlstarlet sel -t -v 'count(/Result/Dataset)' oscar.xml

Solution

  • In this, the quotes are all shell syntax; in consequence, the shell strips the quotes before the query is given to XMLStarlet:

    # bad: looks for @name=ident1, no quotes
    # literal query is: count(/Result/Dataset[@name=ident1]/Row)
    # ...which compares @name against the value of an element under Dataset named ident1
    # ...since no such element exists, the result is a count of 0.
    xmlstarlet sel -t -v 'count(/Result/Dataset[@name='ident1']/Row)' oscar.xml
    

    Instead, make it:

    # good: uses ""s on the outside, so ''s on the inside are literal
    # literal query is: count(/Result/Dataset[@name='ident1']/Row)
    xmlstarlet sel -t -v "count(/Result/Dataset[@name='ident1']/Row)" oscar.xml
    

    ...or, if your shell is bash:

    # good (but nonportable): uses $'' syntax, which makes \' produce a single literal '
    # literal query is: count(/Result/Dataset[@name='ident1']/Row)
    xmlstarlet sel -t -v $'count(/Result/Dataset[@name=\'ident1\']/Row)' oscar.xml
    

    All the above happens because quotes, in shell, are a per-character characteristic. You can, for instance, write:

    echo "$foo"'$bar'$baz
    

    ...and $foo will be expanded per double-quote rules (literal replacement with comments), $bar will be treated as a literal string, and $baz will be expanded per unquoted-expansion rules (with string-splitting and globbing resulting in oft-unwanted behaviors).