Have this $html
:
$html = '<p>random</p>
<a href="">Test 1</a> (target1)
<br>
<a href="">Test 2</a> (target1)
<br>
<a href="">Test 3</a> (skip)
// etc
';
And I have a few terms in $array
:
$array = array(
'(target1)',
'(target2)'
);
How can I skim through $html
using domdocument to find all terms in $array
and grab the content of the <a>
tag that precedes it?
So I end up with the following results:
$results = array(
array(
'text' => 'Test 1',
'needle' => 'target1'
),
array(
'text' => 'Test 2',
'needle' => 'target1'
)
);
With the following approach, I have managed to grab the content of all <a>
tags in $html
:
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $html);
$xpath = new DOMXPath($doc);
$elements = $xpath->query('//a');
$el_array = array();
if ($elements->length > 0) {
foreach($elements as $n) {
$node = trim(strip_tags($n->nodeValue));
if (!empty($node)) {
$el_array[] = $node;
}
}
if (!empty($el_array) && is_array($el_array)) {
print_r($el_array);
}
}
But I have not found a way to grab the target terms so that I can check if we have a match.
You can create a dynamic xpath query with contains and following-sibling.
The xpath expression will be:
//a/following-sibling::text()[contains(., '(target1)') or contains(., '(target2)')]
For example:
$array = array(
'(target1)',
'(target2)'
);
$contains = implode(" or ", array_map(function($x) {
return "contains(., '$x')";
}, $array));
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $html);
$xpath = new DOMXPath($doc);
$elements = $xpath->query("//a/following-sibling::text()[$contains]");
$results = [];
foreach ($elements as $element) {
$results[] = [$element->previousSibling->nodeValue, trim($element->nodeValue)];
}
print_r($results);
Result:
Array
(
[0] => Array
(
[0] => Test 1
[1] => (target1)
)
[1] => Array
(
[0] => Test 2
[1] => (target2)
)
)