Search code examples
phpdomdomdocumentdomxpath

Getting second p tag inside a specific ID from HTML ysubg DOMDocument


How can I get content from the second <p> tag inside a div with ID mydiv using DOMDocument?

For example, my HTML might look like:

<div class='mydiv'>
<p><img src='xx.jpg'></p>
<p>i need here</p>
<p>lorem ipsum lorem ipsum</p>
</div>

I'm trying to extract the following text:

i need here

How can I do it?


Solution

  • Getting the contents from nth <p> tag:

    Use DOMDocument::getElementsByTagName() to get all the <p> tags, and use item() to retrieve the node value of the second tag from the returned DOMNodeList:

    $index = 2;
    
    $dom = new DOMDocument;
    $dom->loadHTML($html);
    $tags = $dom->getElementsByTagName('p');
    echo $tags->item(($index-1))->nodeValue; // to-do: check if that index exists
    

    Getting the contents from nth<p> tag inside a div with given ID

    If you want to retrieve the node value of a <p> tag inside a specific ID, then you can use an XPath expression instead of getElementsByTagName():

    $index = 2;
    $id    = 'mydiv'
    
    $dom = new DOMDocument;
    $dom->loadHTML($html);
    
    $xpath = new DOMXPath($dom);
    $tags = $xpath->query(
        sprintf('//div[@id="%s"]/p', $id)
    );
    

    Demo.