Search code examples
phpxpathdomdocument

Get content of tag if term exists after it using domdocument


Have this $html:

$html = '<p>random</p>
<a href="">Test 1</a> (target1)
<br>
<a href="">Test 2</a>  (target1)
<br>
<a href="">Test 3</a> (skip)
// etc
';

And I have a few terms in $array:

$array = array(
    '(target1)',
    '(target2)'
);

How can I skim through $html using domdocument to find all terms in $array and grab the content of the <a> tag that precedes it?

So I end up with the following results:

$results = array(
    array(
        'text' => 'Test 1',
        'needle' => 'target1'
    ),
    array(
        'text' => 'Test 2',
        'needle' => 'target1'
    )
);

What I've tried so far

With the following approach, I have managed to grab the content of all <a> tags in $html:

$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $html);
$xpath = new DOMXPath($doc);

$elements = $xpath->query('//a'); 
$el_array = array();
if ($elements->length > 0) {
    foreach($elements as $n) {
        $node = trim(strip_tags($n->nodeValue));
        if (!empty($node)) {
            $el_array[] = $node;
        }
    }
    if (!empty($el_array) && is_array($el_array)) {
    print_r($el_array);
    }
}

But I have not found a way to grab the target terms so that I can check if we have a match.


Solution

  • You can create a dynamic xpath query with contains and following-sibling.

    The xpath expression will be:

    //a/following-sibling::text()[contains(., '(target1)') or contains(., '(target2)')]
    

    For example:

    $array = array(
        '(target1)',
        '(target2)'
    );
    
    $contains =  implode(" or ", array_map(function($x) {
        return "contains(., '$x')";
    }, $array));
    
    $doc = new DOMDocument();
    $doc->loadHTML('<?xml encoding="utf-8" ?>' . $html);
    $xpath = new DOMXPath($doc);
    $elements = $xpath->query("//a/following-sibling::text()[$contains]");
    $results = [];
    
    foreach ($elements as $element) {
        $results[] = [$element->previousSibling->nodeValue, trim($element->nodeValue)];
    }
    
    print_r($results);
    

    Result:

    Array
    (
        [0] => Array
            (
                [0] => Test 1
                [1] => (target1)
            )
    
        [1] => Array
            (
                [0] => Test 2
                [1] => (target2)
            )
    
    )
    

    Demo