Search code examples
phpsimple-html-dom

Simple HTML Dom - find text between divs


I need to extract the text in between divs here ("The third of four...") - using Simple HTML Dom PHP library.

I have tried everything I think! next_sibling() returns the comment, and next_sibling()->next_sibling() returns the <br/> tag. Ideally I would like to get all the text from the end of the first comment and to the next </div> tag.

<div class="left">
Bla-bla..
<div class="float">Bla-bla...
</div><!--/end of div.float-->
    <br />The third of four performances in the Society's Morning Melodies series features...<a href='index.php?page=tickets&month=20140201'>&lt;&lt; Back to full event listing</a>
</div><!--/end of div.left-->

This below prints <!--/end of div.float--> - the comment tag.

//find content that follows div with a class float. There is a comment in between.
$div_float = $html->find("div.float");
$betweendivs =  $div_float[0]->next_sibling();
$actual_content = $betweendivs ->outertext ;
echo $actual_content;

My next step would be getting innertext of the div.left and then deleting of all the divs inside of it, but that seems like a major hassle. Is there anything easier I can do?


Solution

  • Use find('text', $index) to get all the text blocks, where $index is the index of the wanted text...

    So in this case, it's:

    echo $html->find('text', 3);
    
    // OUTPUT:
    The third of four performances in the Society's Morning Melodies series features...
    

    You can read more in the Manual

    EDIT:

    Here's a working code:

    $input = '<div class="left">
    Bla-bla..
    <div class="float">Bla-bla...
    </div><!--/end of div.float-->
        <br />The third of four performances in the Society\'s Morning Melodies series features...<a href="index.php?page=tickets&month=20140201">&lt;&lt; Back to full event listing</a>
    </div><!--/end of div.left-->';
    
    //Create a DOM object
    $html = new simple_html_dom();
    // Load HTML from a string
    $html->load($input);
    
    // Using $index
    echo $html->find('text', 3);
    
    echo "<hr>";
    
    // Or, it's the 3rd element starting from the end
    $text = $html->find('text');
    echo $text[count($text)-3];
    
    // Clear DOM object
    $html->clear();
    unset($html);
    
    // OUTPUT
    The third of four performances in the Society's Morning Melodies series features...
    The third of four performances in the Society's Morning Melodies series features...
    

    Working DEMO