Search code examples
phphtmlparsinghtml-listssimple-html-dom

How to parse a list using simple html dom


I have an html code, and I'm facing a problem parsing a data out of this html specifically from the part given below:

<li id=xyz>
  John Johnson
<sup>1<sup>
","
</li>

I want to extract "John Johnson" out of this list and nothing else. Not sure how to do so. Thanks.


Solution

  • find('text') is what you're after. It returns all text blocks found in the source.

    Based on your example here's a working code:

    // Test data
    $input = <<<_DATA_
        <li id=xyz>
          John Johnson
        <sup>1<sup>
        ","
        </li>
    _DATA_;
    
    //Create a DOM object
    $html = new simple_html_dom();
    // Load HTML from a string
    $html->load($input);
    
    // >> Long answer
    echo "Long answer:<br/>";
    
    // Search all text nodes inside the target node
    $search = $html->find('li#xyz text');
    
    // Loop through each node and print it
    foreach( $search as $i => $txt ) {
        // No need to specify "->plaintext" since the content is already in plain text here
        echo "$i => " . $txt->plaintext . "<br/>";
    }
    
    // >> Short answer
    echo "<hr>";
    echo "Short answer:<br/>";
    
    // Specifying the index (0th here) returns the Nth element from the array containing all search results
    echo $html->find('li#xyz text', 0)->plaintext;
    
    // Clear DOM object
    $html->clear();
    unset($html);
    

    OUTPUT:

    Long answer:
    0 => John Johnson 
    1 => 1
    2 => "," 
    3 => 
    -------------------
    Short answer:
    John Johnson
    

    For more details check the Manual