Search code examples
phpxmlxpathepub

PHP / xPath Query on ncx (epub) fails


I'm unable to retrieve results using xPath on file(s) like this one :

<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1">
   <head>
       <meta name="dtb:uid" content="RT8513Z9UM0NLKLF8QX9QDJ3E6ZFL2"/>
       <meta name="dtb:depth" content="3"/>
   </head>
   <docTitle>
       <text>Document Title</text>
   </docTitle>
   <navMap>
       <navPoint id="navPoint-1" playOrder="1">
           <navLabel>
               <text>Section with no subsection</text>
           </navLabel>
           <content src="text/content001.xhtml"/>
       </navPoint>
       <navPoint id="navPoint-2" playOrder="2">
           <navLabel>
               <text>TOC entry name Section title
               </text>
           </navLabel>
           <content src="text/content001.xhtml#heading_id_3"/>
           <navPoint id="navPoint-3" playOrder="3">
               <navLabel>
                   <text>Section entry name.</text>
               </navLabel>
               <content src="text/content002.xhtml"/>
           </navPoint>
           <navPoint id="navPoint-4" playOrder="4">
               <navLabel>
                   <text>Introduction.</text>
               </navLabel>
           </navPoint>
       </navPoint>
   </navMap>
</ncx>

Performing following code:

$ncx = new DOMDocument();
$ncx->preserveWhiteSpace = false;
$ncx->load('/path/to/file');

$xpath = new DOMXPath( $ncx );

$query1 = 'namespace::*';
$result = $xpath->query( $query1 );
echo $result->length . PHP_EOL;

$query2 = '/ncx/navMap/navLabel/text[. = "Introduction."]';
$result = $xpath->query( $query2 );
echo $result->length . PHP_EOL;

$head = $ncx->getElementsbyTagName('head')->item(0);

$query3 = 'head/meta[@name="dtb:depth"]';
$result = $xpath->query( $query3, $head );
echo $result->length . PHP_EOL;

$query4 = 'meta[@name="dtb:depth"]';
$result = $xpath->query( $query4, $head );
echo $result->length . PHP_EOL;

only $query1 produces valid result. Anyone can suggest where is the mistake?

Thanks


Solution

  • The core problem is that your XPath didn't consider the XML namespace. Your XML has default namespace defined here :

    <ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1">
    

    So the ncx element and it's descendants without prefix are in that default namespace. To query element in default namespace, you need to map a prefix to the namespace and use that prefix in your XPath, for example :

    //map prefix "d" to the default namespace uri
    $xpath->registerNamespace("d", "http://www.daisy.org/z3986/2005/ncx/");
    .....
    $head = $ncx->getElementsbyTagName('head')->item(0);
    .....
    //use the registered prefix properly in the XPath
    $query4 = 'd:meta[@name="dtb:depth"]';
    $result = $xpath->query( $query4, $head );
    echo $result->length . PHP_EOL;
    

    eval.in demo

    output :

    1
    

    other than the namespace problem as explained above, you need to recheck your XPath i.e $query2, make sure it corresponds exactly to the location of the target elements in the XML.