I have an html code, and I'm facing a problem parsing a data out of this html specifically from the part given below:
<li id=xyz>
John Johnson
<sup>1<sup>
","
</li>
I want to extract "John Johnson" out of this list and nothing else. Not sure how to do so. Thanks.
find('text')
is what you're after. It returns all text blocks found in the source.
Based on your example here's a working code:
// Test data
$input = <<<_DATA_
<li id=xyz>
John Johnson
<sup>1<sup>
","
</li>
_DATA_;
//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($input);
// >> Long answer
echo "Long answer:<br/>";
// Search all text nodes inside the target node
$search = $html->find('li#xyz text');
// Loop through each node and print it
foreach( $search as $i => $txt ) {
// No need to specify "->plaintext" since the content is already in plain text here
echo "$i => " . $txt->plaintext . "<br/>";
}
// >> Short answer
echo "<hr>";
echo "Short answer:<br/>";
// Specifying the index (0th here) returns the Nth element from the array containing all search results
echo $html->find('li#xyz text', 0)->plaintext;
// Clear DOM object
$html->clear();
unset($html);
OUTPUT:
Long answer:
0 => John Johnson
1 => 1
2 => ","
3 =>
-------------------
Short answer:
John Johnson
For more details check the Manual