Search code examples
xmlxpathxmllint

xpath's text() does not return custom entities


I have the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
  <!ELEMENT root (entry*)>
  <!ELEMENT entry (#PCDATA)>
  <!ENTITY abc "a b c">
  <!ENTITY xyz "x y z">
]>
<root>
  <entry>&abc;</entry>
  <entry>&xyz;</entry>
  <entry>text</entry>
</root>

I use the following command to test my XPaths on it:

xmllint --xpath '...' test.xml

I am trying to match some custom entities with an XPath that looks like:

//entry[text() = '&abc;']

But it doesn't match anything. So I even tried:

//entry/text()

And the only result is text from the last entry, nothing from the first two. If text() doesn't return custom entities, is there anything else that does? Is there a way to match only entries containing &abc;?


Solution

  • Conformant behavior

    You cannot test against the &abc; internal general entity reference because an XML parser must substitute an internal general entity reference with its replacement text (a b c) when internal general entity references appear in an XML document's content.

    You can see this in action by changing your XPath from

    //entry[text() = '&abc;']
    

    which selects nothing to

    //entry[text() = 'a b c']
    

    which selects the entry element containing the replacement text.

    The replacement text should be available as text nodes, so

    //entry/text()
    

    selects three text nodes:

    a b c
    x y z
    text
    

    xmllint's behavior

    To get this expected behavior from xmllint, use the (oddly named) --noent flag.