php html parsing html-lists simple-html-dom

How to parse a list using simple html dom

I have an html code, and I'm facing a problem parsing a data out of this html specifically from the part given below:

<li id=xyz>
  John Johnson
<sup>1<sup>
","
</li>

I want to extract "John Johnson" out of this list and nothing else. Not sure how to do so. Thanks.

Solution

find('text') is what you're after. It returns all text blocks found in the source.

Based on your example here's a working code:

// Test data
$input = <<<_DATA_
    <li id=xyz>
      John Johnson
    <sup>1<sup>
    ","
    </li>
_DATA_;

//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($input);

// >> Long answer
echo "Long answer:<br/>";

// Search all text nodes inside the target node
$search = $html->find('li#xyz text');

// Loop through each node and print it
foreach( $search as $i => $txt ) {
    // No need to specify "->plaintext" since the content is already in plain text here
    echo "$i => " . $txt->plaintext . "<br/>";
}

// >> Short answer
echo "<hr>";
echo "Short answer:<br/>";

// Specifying the index (0th here) returns the Nth element from the array containing all search results
echo $html->find('li#xyz text', 0)->plaintext;

// Clear DOM object
$html->clear();
unset($html);

OUTPUT:

Long answer:
0 => John Johnson 
1 => 1
2 => "," 
3 => 
-------------------
Short answer:
John Johnson

For more details check the Manual