Search code examples
phpxmlxpathdomdocumentdomxpath

Get div from external page, then delete an another div from it


I need a little help, with getting content from external webpages.

I need to get a div, and then delete another div from inside it. This is my code, can someone help me?

This is the relevant portion of my XML code:

<html>
    ...
    <body class="domain-4 page-product-detail" > ...

         <div id="informacio" class="htab-fragment"> <!-- must select this -->
            <h2 class="description-heading htab-name">Utazás leírása</h2>
            <div class="htab-mobile tab-content">
                <p class="tab-annot">* Hivatalos ismertető</p>

                <div id="trip-detail-question"> <!-- must delete this -->
                    <form> ...</form>
                </div>

                <h3>USP</h3><p>Nagy, jól szervezett és családbarát ...</p>
                <div class="message warning-message">
                    <p>Az árak már minden aktuális kedvezményt tartalmaznak!</p>
                    <span class="ico"></span>
                </div>
            </div>
        </div>
        ... 
    </body>
</html>

I need to get the div with id="informacio", and after that I need to delete the div id="trip-detail-question" from it including the form it contains.

This is my code, but its not working correctly :(.

function get_content($url){

    $doc = new DOMDocument;

    $doc->preserveWhiteSpace = false;
    $doc->strictErrorChecking = false;
    $doc->recover = true;

    $doc->loadHTMLFile($url);

    $xpath = new DOMXPath($doc);

    $query = "//div[@id='informacio']";
    $entries = $xpath->query($query)->item(0);

    foreach($xpath->query("div[@id='trip-detail-question']", $entries) as $node)
        $node->parentNode->removeChild($node);

    $var = $doc->saveXML($entries);
    return $var;
}

Solution

  • Your second XPath expression is incorrect. It tries to select a div in the context of the div you selected previously as its child node. You are trying to select:

    //div[@id='informacio']/div[@id='trip-detail-question']
    

    and that node does not exist. You want this node:

    //div[@id='informacio']/div/div[@id='trip-detail-question']
    

    which you can also select like this (allowing any element, not just div):

    //div[@id='informacio']/*/div[@id='trip-detail-question']
    

    or (allowing more than one nesting levels)

    //div[@id='informacio']//div[@id='trip-detail-question']
    

    In the context of the first div, the correct XPath expression would be:

    .//div[@id='trip-detail-question']
    

    If you change it in your code, it should work:

    foreach($xpath->query(".//div[@id='trip-detail-question']", $entries) as $node)
        $node->parentNode->removeChild($node);