Search code examples
phpxmlrssdomdocumentxml-namespaces

What are DOMDocument namespaces for?


$xpath->registerNamespace('slash', 'http://purl.org/rss/1.0/modules/slash/');

From what I understand they act like document definitions, and are required to identify certain XML elements.

Does PHP actually do a request to that URL and verify if the element exists in the document definition?

Because that URL shows a 404 not found page :(

$result = $xpath->evaluate('string(//atom:entry[3]/slash:comments)');

Could this be the reason why I get an empty string, while trying to retrieve the value of the <slash> element from a RSS feed?


Solution

  • $xpath->registerNamespace('slash', 'http://purl.org/rss/1.0/modules/slash/');
    

    From what I understand they act like document definitions, and are required to identify certain XML elements.

    Does PHP actually do a request to that URL and verify if the element exists in the document definition?

    No.
    That URI identifies an XML namespace, that represents an XML vocabulary. Such namespaces are designed to cope with different contexts using the same term with different meanings. With namespaces, a single XML file can contains tags and attribute with the same "name", that are qualified via a prefix. For example you can have a xml document like this:

    <html xmlns="http://www.w3.org/1999/xhtml" 
            xmlns:human="http://sample.xml.com/Human">
      <title>John Smith measures.</title>
      <body>
        <human:name>John</human:name> <human:surname>Smith</human:surname>
        is <human:height unit="feet">6</human:height> feet tall.
      </body>
    </html>
    

    In such content the "human" prefix is used to mark elements from the http://sample.xml.com/Human namespace and the empty string (that is the default prefix) is used to mark elements from the http://www.w3.org/1999/xhtml namespace. These URI are namespace identifiers, not schema locations (that can be expressed with either DOCTYPE declaration or XML Schema instance). It's a good practice to provide proper documentation of the namespace at the location identified by the namespace URI, but it's not required (indeed the xhtml namespace URI points to the related W3C documentation, but the RSS extension you are looking for, doesn't).

    Note however that both resolveExternals and validateOnParse can affect the download of DTDs or schema definitions referred by the target xml, but not namespace documentation. By no means, any parser would download such a documentation, since it's intended for human consumption.

    $result = $xpath->evaluate('string(//atom:entry[3]/slash:comments)');
    

    Could this be the reason why I get an empty string, while trying to retrieve the value of the element from a RSS feed?

    No.
    First, check that the source xml contains the correct xmlns declarations and that it contains a <slash:comments> node inside the third atom entry (note, the third, because xpath indexing is one based, so that //atom:entry[1] means each entry that is the first in its own parent node, //atom:entry[2] the second and so on).
    If so, I suspect that you forgot to register the atom namespace.
    Try something like this (adapted from the users' contribution to DOMXPath::registerNamespace documentation):

    $doc = new DOMDocument;
    $doc->loadXML($xml); // your xml string here
    $xpath = new DOMXPath($doc);
    
    $xpath->registerNamespace('atom', "http://www.w3.org/2005/Atom");
    $xpath->registerNamespace('slash', 'http://purl.org/rss/1.0/modules/slash/');
    
    $result =  $xpath->evaluate('string(//atom:entry[3]/slash:comments)');
    

    You can see this running at http://codepad.org/JX8RpaKu

    Indeed, to use qualified xpaths, you need to register the default namespace too.