How to parse an XML with xhtml:link in PHP?

The goal:

  • import an external XML file (for this example, it's inline)
  • get the < loc >, save into variable
  • find the < xhtml:link > that has the href-lang="fr-ca" attribute, get the href value, save into variable
  • insert both in the DB

Problem I have: I can not get PHP to even recognize that xhtml:link is a childNode of the < url > item; even when I simply spit out the nodeValue for the < url >, it omits all < xhtml:link > child nodes.

Code I am using/tried:

$xml = <<< XML
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="" xmlns:image="" xmlns:xhtml="">
<url xmlns="">
  <xhtml:link xmlns:xhtml="" rel="alternate" hreflang="en-ae" href="" />
  <xhtml:link xmlns:xhtml="" rel="alternate" hreflang="de-at" href="" />
  <xhtml:link xmlns:xhtml="" rel="alternate" hreflang="en-au" href="" />
  <xhtml:link xmlns:xhtml="" rel="alternate" hreflang="en-ca" href="" />
  <xhtml:link xmlns:xhtml="" rel="alternate" hreflang="fr-ca" href="" />
<url xmlns="">
  <xhtml:link xmlns:xhtml="" rel="alternate" hreflang="en-ae" href="" />
  <xhtml:link xmlns:xhtml="" rel="alternate" hreflang="de-at" href="" />
  <xhtml:link xmlns:xhtml="" rel="alternate" hreflang="en-au" href="" />
  <xhtml:link xmlns:xhtml="" rel="alternate" hreflang="fr-be" href="" />
  <xhtml:link xmlns:xhtml="" rel="alternate" hreflang="nl-be" href="" />
  <xhtml:link xmlns:xhtml="" rel="alternate" hreflang="en-bh" href="" />
  <xhtml:link xmlns:xhtml="" rel="alternate" hreflang="en-ca" href="" />
  <xhtml:link xmlns:xhtml="" rel="alternate" hreflang="fr-ca" href="" />

$urlsxml = new DOMDocument;
$urls = $urlsxml->getElementsByTagName('url');

for ($i = 0; $i < $urls->length; $i++) {

      echo $urls->item($i)->nodeValue;
      echo $urls->getElementsByTagName("xhtml:link")->attributes->getNamedItem("hreflang")->nodeValue;




  • The XML uses two namespaces without an alias and with the alias xhtml. To read XML with namespaces you should use the *NS variants of the DOM methods.

    $urls = $urlsxml->getElementsByTagNameNS(
      '', 'url'
    $urls[$i]->getElementsByTagNameNS('', 'link');

    The first argument is the namespace URI, the second argument the local name (node name with the prefix). It would be a good idea to use a constant/variable for the namespace URIs in this case.

    A more comfortable option is Xpath. It allows you to use location paths and conditions to fetch nodes.

    $document = new DOMDocument;
    // create an xpath instance for the document
    $xpath = new DOMXpath($document);
    // register the namespaces for your own prefixes
    $xpath->registerNameSpace('s', '');
    $xpath->registerNameSpace('x', '');
    // iterate all sitemap url elements
    foreach ($xpath->evaluate('//s:url') as $url) {
      $data = [
        // get the sitemap loc child element as a string
        'loc' => $xpath->evaluate('string(s:loc)', $url),
        // get the href attribute of the xhtml link element (with language condition)
        'fr-ca' => $xpath->evaluate('string(x:link[@hreflang="fr-ca"]/@href)', $url),


    array(2) { 
      string(58) "" 
      string(58) "" 
    array(2) { 
      string(58) "" 
      string(58) "" 

    The string() in Xpath casts the first node in a list into a string. It allows you to avoid the explicit access to the node object properties. For example $xpath->evaluate('s:loc', $url)->item(0)->textContent; can be written as $xpath->evaluate('string(s:loc)', $url);. Unlike the property access the Xpath cast will not fail with an error if no matching node exists. It will return an empty string.