Search code examples
phpxpathdomxpath

Getting div value (content/text) using XPath


I have next html structure:

<li id="REQUIRED_ITEM_1" class="listing-post">

    <a class="listing-thumb" href="blah" title="blah" data-palette-listing-image="">

        <img src="REQUIRED_ITEM_2" width="75" height="75" alt="blah"> </a>

    <div class="listing-detail ">

        <div class="listing-title">

            <div class="listing-icon hidden"></div>

              <a href="REQUIRED_ITEM_3" class="title" title="REQUIRED_ITEM_4">blah</a>

              <div class="listing-maker">

                <span class="name wrap"><a href="REQUIRED_ITEM_5">blah</a></span>

              </div>

        </div>

        <div class="listing-date">
            REQUIRED_ITEM_6
        </div>

        <div class="listing-price">
            Sold
        </div>

    </div>
    </li>

There are few dozens of these <li> on the same page, all with different id and content. The content that I need is marked REQUIRED_ITEM_1 - REQUIRED_ITEM_6.

I am collecting the data from these <li>s with the help of Xpath.

Here is the code I use:

    foreach($xpath->query("//li[@class='listing-post']") as $link) {

    $REQUIRED_ITEM_1 = $link->getAttribute('id');
    $REQUIRED_ITEM_2 = $xpath->query(".//img", $link)->item(0)->getAttribute('src');
    $REQUIRED_ITEM_3 = $xpath->query(".//a", $link)->item(1)->getAttribute('href');
    $REQUIRED_ITEM_4 = $xpath->query(".//a", $link)->item(1)->getAttribute('title');
    $REQUIRED_ITEM_5 = $xpath->query(".//a", $link)->item(2)->getAttribute('href');

    $REQUIRED_ITEM_6 = $xpath->query("./div/text", $link)->item(4);
}

It works as intended for the first 5 REQUIRED_ITEMs, however it seems the code to get text contained within listing-date div (REQUIRED_ITEM_6) is wrong.

Also, is this the best way to parse my html and collect data, or is there a better approach?


Solution

  • Here is the xPath to get REQUIRED_ITEM_6

    //li[@class='listing-post']//div[@class='listing-date']/text()
    

    That would be little bit faster (but first version may be more safe, since it is less dependent on XML structure).

    //li[@class='listing-post']/div/div[@class='listing-date']/text()
    

    So your code must look like something like this (but you may need to adjust it little bit with your php, not sure why you used item(4)).

    $REQUIRED_ITEM_6 = $xpath->query(".//div[@class='listing-date']/text()", $link)->item(0)->textContent;