Search code examples
xmlparsingxpathw3cebnf

parsing XPath expression understanding EBNF production rules


I have a beginners question regarding the W3C specification (EBNF notation) of XPath expressions. The specification can be found at: http://www.w3.org/TR/xpath/. In particular I have a question about understanding the following expression:

(//attribute::name | //attribute::id)[starts-with(string(self::node()), "be") or starts-with(string(self::node()), "1")]

This appears to be a valid expression. I verified using http://www.freeformatter.com/xpath-tester.html with the following XML document:

<documentRoot>
<!-- Test data -->
<?xc value="2" ?>
<parent name="data" >
   <child id="1"  name="alpha" >Some Text</child>
   <child id="2"  name="beta" >
      <grandchild id="2.1"  name="beta-alpha" ></grandchild>
      <grandchild id="2.2"  name="beta-beta" ></grandchild>
   </child>
   <pet name="tigger"  type="cat" >
      <data>
         <birthday month="sept"  day="19" ></birthday>
         <food name="Acme Cat Food" ></food>
      </data>
   </pet>
   <pet name="Fido"  type="dog" >
      <description>
         Large dog!
      </description>
      <data>
         <birthday month="feb"  day="3" ></birthday>
         <food name="Acme Dog Food" ></food>
      </data>
   </pet>
   <rogue name="is this real?" >
      <data>
         Hates dogs!
      </data>
   </rogue>
   <child id="3"  name="gamma"  mark="yes" >
      <!-- A comment -->
      <description>
         Likes all animals - especially dogs!
      </description>
      <grandchild id="3.1"  name="gamma-alpha" >
         <![CDATA[ Some non-parsable character data ]]>
      </grandchild>
      <grandchild id="3.2"  name="gamma-beta" ></grandchild>
   </child>
</parent>
</documentRoot>

This gives me the following results:

Attribute='id="1"'
Attribute='name="beta"'
Attribute='name="beta-alpha"'
Attribute='name="beta-beta"'

It is not clear to me which sequence of EBNF productions would produce the above query.

Thanks for help.


Solution

  • I don't know how to correctly represent this but Expr >>> FilterExpr Predicate:

    Expr > OrExpr > AndExpr > EqualityExpr > RelationalExpr > AdditiveExpr > MultiplicativeExpr > UnaryExpr > UnionExpr > PathExpr > FilterExpr > FilterExpr Predicate
    

    gives you the 2 parts:

    • the filter (//attribute::name | //attribute::id)
    • and the predicate [starts-with(string(self::node()), "be") or starts-with(string(self::node()), "1")]

    (//attribute::name | //attribute::id)

    FilterExpr > PrimaryExpr > '(' Expr ')'
    Expr > OrExpr > AndExpr > EqualityExpr > RelationalExpr > AdditiveExpr > MultiplicativeExpr > UnaryExpr > UnionExpr > UnionExpr '|' PathExpr
    

    gives you //attribute::name and //attribute::id

    //attribute::name and //attribute::id

    PathExpr > LocationPath > AbsoluteLocationPath > AbbreviatedAbsoluteLocationPath > '//' RelativeLocationPath
    RelativeLocationPath > Step > AxisSpecifier NodeTest Predicate*
        - AxisSpecifier > AxisName '::'
            - AxisName > 'attribute'
        - NodeTest > NameTest
    

    NameTest being name and id

    Predicate [starts-with(string(self::node()), "be") or starts-with(string(self::node()), "1")]

    Predicate > '[' PredicateExpr ']' > Expr > OrExpr > OrExpr 'or' AndExpr
        - OrExpr > AndExpr
        - AndExpr > EqualityExpr > RelationalExpr > AdditiveExpr > MultiplicativeExpr > UnaryExpr > UnionExpr > PathExpr > FilterExpr > PrimaryExpr > FunctionCall > FunctionName '(' ( Argument ( ',' Argument )* )? ')'
            Argument > Expr
    

    FunctionName being starts-with, first argument being another FunctionCall (string function), second argument being Literals (via PathExpr > FilterExpr > PrimaryExpr), "be" and "1".

    Finally, self::node() comes from:

    RelativeLocationPath > Step > AxisSpecifier NodeTest Predicate*
        - AxisSpecifier > AxisName '::'
            - AxisName > 'attribute'
        - NodeTest > NodeType '(' ')'
    

    NodeType being 'node'