The goal:
Problem I have: I can not get PHP to even recognize that xhtml:link is a childNode of the < url > item; even when I simply spit out the nodeValue for the < url >, it omits all < xhtml:link > child nodes.
Code I am using/tried:
<?php
$xml = <<< XML
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<loc>https://www.example.com/ca/en/cat/categories/series/07660/</loc>
<lastmod>2018-11-07</lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="en-ae" href="https://www.example.com/ae/en/cat/categories/series/07660/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="de-at" href="https://www.example.com/at/de/cat/07660/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="en-au" href="https://www.example.com/au/en/cat/categories/series/07660/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="en-ca" href="https://www.example.com/ca/en/cat/categories/series/07660/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="fr-ca" href="https://www.example.com/ca/fr/cat/categories/series/07660/" />
</url>
<url xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<loc>https://www.example.com/ca/en/cat/categories/series/07683/</loc>
<lastmod>2018-11-07</lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="en-ae" href="https://www.example.com/ae/en/cat/categories/series/07683/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="de-at" href="https://www.example.com/at/de/cat/07683/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="en-au" href="https://www.example.com/au/en/cat/categories/series/07683/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="fr-be" href="https://www.example.com/be/fr/collections/07683/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="nl-be" href="https://www.example.com/be/nl/collecties/07683/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="en-bh" href="https://www.example.com/bh/en/cat/07683/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="en-ca" href="https://www.example.com/ca/en/cat/categories/series/07683/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="fr-ca" href="https://www.example.com/ca/fr/cat/categories/series/07683/" />
</url>
</urlset>
XML;
$urlsxml = new DOMDocument;
$urlsxml->loadXML($xml);
$urls = $urlsxml->getElementsByTagName('url');
for ($i = 0; $i < $urls->length; $i++) {
echo $urls->item($i)->nodeValue;
echo $urls->getElementsByTagName("xhtml:link")->attributes->getNamedItem("hreflang")->nodeValue;
// INSERT INTO DB
}
?>
Out of ideas; any help would be appreciated.
The XML uses two namespaces http://www.sitemaps.org/schemas/sitemap/0.9
without an alias and http://www.w3.org/1999/xhtml
with the alias xhtml
. To read XML with namespaces you should use the *NS
variants of the DOM methods.
$urls = $urlsxml->getElementsByTagNameNS(
'http://www.sitemaps.org/schemas/sitemap/0.9', 'url'
);
$urls[$i]->getElementsByTagNameNS('http://www.w3.org/1999/xhtml', 'link');
The first argument is the namespace URI, the second argument the local name (node name with the prefix). It would be a good idea to use a constant/variable for the namespace URIs in this case.
A more comfortable option is Xpath. It allows you to use location paths and conditions to fetch nodes.
$document = new DOMDocument;
$document->loadXML($xml);
// create an xpath instance for the document
$xpath = new DOMXpath($document);
// register the namespaces for your own prefixes
$xpath->registerNameSpace('s', 'http://www.sitemaps.org/schemas/sitemap/0.9');
$xpath->registerNameSpace('x', 'http://www.w3.org/1999/xhtml');
// iterate all sitemap url elements
foreach ($xpath->evaluate('//s:url') as $url) {
$data = [
// get the sitemap loc child element as a string
'loc' => $xpath->evaluate('string(s:loc)', $url),
// get the href attribute of the xhtml link element (with language condition)
'fr-ca' => $xpath->evaluate('string(x:link[@hreflang="fr-ca"]/@href)', $url),
];
var_dump($data);
}
Output:
array(2) {
["loc"]=>
string(58) "https://www.example.com/ca/en/cat/categories/series/07660/"
["fr-ca"]=>
string(58) "https://www.example.com/ca/fr/cat/categories/series/07660/"
}
array(2) {
["loc"]=>
string(58) "https://www.example.com/ca/en/cat/categories/series/07683/"
["fr-ca"]=>
string(58) "https://www.example.com/ca/fr/cat/categories/series/07683/"
}
The string()
in Xpath casts the first node in a list into a string. It allows you to avoid the explicit access to the node object properties. For example $xpath->evaluate('s:loc', $url)->item(0)->textContent;
can be written as $xpath->evaluate('string(s:loc)', $url);
. Unlike the property access the Xpath cast will not fail with an error if no matching node exists. It will return an empty string.