Search code examples
phpxmldomtraversalyandex

Traversing the XML response from Yandex API using PHP


I am creating a metasearch engine using Yandex API. Yandex gives result in XML format. So we need to traverse the XML response inorder to get the different fields like URL,title ,description etc.

The XML response by Yandex is as follows: http://pastebin.com/kAVAVri9

This is how i have implemented: paste

$dom5 = new DOMDocument();

if ($dom5->loadXML($site_results)) {

    $results  = $dom5->getElementsByTagName("response");
    $results1 = $results->getElementsByTagName("results");
    $results2 = $results1->getElementsByTagName("group");


    $totals["yandex"] = 1000;


    foreach ($results1 as $link) {

        $url = $link->getElementsByTagName("doc")->item(2)->nodeValue;
        ;
        $url = str_replace('http://', '', $url);
        if (substr($url, -1, 1) == '/') {
            $url = substr($url, 0, strlen($url) - 1);
        }
        $search_results[$i]["url"] = $url;

        $title                       = $link->getElementsByTagName("doc")->item(4)->nodeValue;
        $search_results[$i]["title"] = $title;
        $test                        = $link->getElementsByTagName("doc");
        $test1                       = $test->getElementsByTagName("title");
        $desc                        = $test1->getElementsByTagName("headline")->item(0)->nodeValue;
        $search_results[$i]["desc"]  = $desc;

        $search_results[$i]["engine"]   = 'yandex';
        $search_results[$i]["position"] = $i + 1;
        $i++;

    }
}

I am new to php. Please forgive me if i have done some stupid mistake. I am unable to retrive the results through my implementation. Please help me find the mistake and get the necessary fields from xml response. Thank you!


Solution

  • The method getElementsByTagName() returns a DOMNodeList:

    $results  = $dom5->getElementsByTagName("response");
    

    The DOMNodeList does not have a method called getElementsByTagName(), but you call it:

    $results1 = $results->getElementsByTagName("results");
    

    Therefore the fatal error is triggered: Whenever in PHP you execute a method on an object that does not exist, you will get a fatal error and your script stops working.

    Do not call undefined object methods and you should be fine.

    Apart from these basics, for parsing such XML documents I normally suggest SimpleXML, however this XML file is a little specific therfore I suggest to extend from SimpleXML and add the features you likely need to use, in part from regular expressions as well as from DOMDocument.

    One concept you should know about when parsing these XML files is Xpath. For example to access the elements you had that many problems with above, you can write the path literally:

    /*/response/results/grouping/group
    

    In PHP with SimpleXML this looks like:

    $url = 'http://pastebin.com/raw.php?i=kAVAVri9';
    $xml = simplexml_load_file($url, 'MySimpleXML');
    foreach ($xml->xpath('/*/response/results/grouping/group') as $link) {
        # ... operate on $link
    }
    

    A larger example:

    $url = 'http://pastebin.com/raw.php?i=kAVAVri9';
    $url = '../data/yandex.xml';
    $xml = simplexml_load_file($url, 'MySimpleXML');
    foreach ($xml->xpath('/*/response/results/grouping/group') as $link) {
        $url      = $link->doc->url->str()->preg('~^https?://(.*?)/*$~u', '$1');
        $title    = $link->doc->title->text();
        $headline = $link->doc->headline->text();
        printf("<%s> %s\n%s\n\n", $url, $title, wordwrap($headline));
    }
    

    And it's exemplary output:

    <www.facebook.com> " Facebook" - a social networking service
    Allows users to find and communicate with friends, classmates and
    colleagues, share thoughts, photos and videos, and join various groups.
    
    <en.wikipedia.org/wiki/Facebook>  Facebook - Wikipedia, the free encyclopedia
     Facebook is a social networking service launched in February 2004, owned
    and operated by Facebook, Inc. As of September 2012, Facebook has over one
    billion active users, more than half of them using Facebook on a mobile
    device.
    
    <mashable.com/category/facebook>  Facebook 
    
    ...
    

    The PHP code example above needs some more code to work because it extends from SimpleXML for the ease of use. This is done with the following code:

    class MySimpleXML extends SimpleXMLElement
    {
        public function text()
        {
            $string = null === $this[0] ? ''
                : (dom_import_simplexml($this)->textContent);
    
            return $this->str($string)->normlaizeWS();
        }
    
        public function str($string = null)
        {
            return new MyString($string ?: $this);
        }
    }
    
    class MyString
    {
        private $string;
    
        public function __construct($string)
        {
            $this->string = $string;
        }
    
        public function preg($pattern, $replacement)
        {
            return new self(preg_replace($pattern, $replacement, $this));
        }
    
        public function normlaizeWS()
        {
            return $this->preg('~\s+~', ' ');
        }
    
        public function __toString()
        {
            return (string) $this->string;
        }
    }
    

    This might be all a little bit much for the beginning, checkout the PHP manual for SimpleXML and the other functions used in the code-example.