Search code examples
phpdomdocument

Getting link tag via DOMDocument


I convert an atom feed into RSS using atom2rss.xsl. Works fine.

Then, using DOMDocument, I try to get the post title and URL:

$feed = new DOMDocument();
$feed->loadHTML('<?xml encoding="utf-8" ?>' . $html);

if (!empty($feed) && is_object($feed) ) {
    foreach ($feed->getElementsByTagName("item") as $item){
        echo 'url: '. $item->getElementsByTagName("link")->item(0)->nodeValue;
        echo 'title'. $item->getElementsByTagName("title")->item(0)->nodeValue;
    }
    return;
}

But the post URL is empty.

See this eval which contains HTML. What am I doing wrong? I suspect I am not getting the link tag properly via $item->getElementsByTagName("link")->item(0)->nodeValue.


Solution

  • I think the problem is that there are several <link> elements in each item and the one (I think) your interested in is the one with rel="self" as an attribute. The quickest way (without messing around with XPath) is to loop over each <link> element checking for the right rel value and then take the href attribute from that...

    if (!empty($feed) && is_object($feed) ) {
        foreach ($feed->getElementsByTagName("item") as $item){
            $url = "";
            // Look for the 'right' link tag and extract URL from that
            foreach ( $item->getElementsByTagName("link") as $link )    {
                if ( $link->getAttribute("rel") == "self" ) {
                    $url = $link->getAttribute("href");
                    break;
                }
            }
            echo 'url: '. $url;
            echo 'title'. $item->getElementsByTagName("title")->item(0)->nodeValue;
        }
        return;
    }
    

    which gives...

    url: https://www.blogger.com/feeds/2984353310628523257/posts/default/1947782625877709813titleExtraordinary Genius - Cp274