Search code examples
phpdomdocumentdomxpath

Using DOMDocument and DOMXPath how can I ignore some characters for the match?


I'm using the DOMDocument and DOMXPath to determine the presence of some phrase (Keyword phrase) in my HTML content, for example to search if the Keyword is in Bold. I use the follow code and works fine except that I need to "ignore" some characters when the keyword is searched. With the follow code:

$characters_to_ignore = array(':','(',')','/');
$keyword = 'keyword AAA';
$content = "Some HTML content for example <b>keyword: AAA</b> and other HTML";
$exp = '//b[contains(., "' . $keyword . '")]|//strong[contains(., "' . $keyword . '")]|//span[contains(@style, "bold") and contains(., "' .  $keyword . '")]';

$doc = new DOMDocument();
$doc->loadHTML(strtolower($content));
$xpath = new DOMXPath($doc);
$elements = $xpath->query($exp);

I would need to identify "keyword: AAA" as well as "keyword AAA", so I need to specify to the DOMXPath query to ignore the characters in variable $characters_to_ignore when search for the keyword phrase.

The previous code works fine for "keyword AAA", how can I change it to match "keyword: AAA" too? (and with any of the characters in $characters_to_ignore)

New Information: Maybe using this?

fn:contains(string1,string2)

but I can't get a working example.


Solution

  • Well, you probably already solved it somehow, but here's the solution...

    It would be trivial using XPath 2.0 method matches(), but PHP DOMXPath class supports only XPath 1.0 yet.

    But as of PHP 5.3, DOMXPath class have the registerPHPFunctions() method which allow us to use PHP functions as XPath functions. :)

    Making it work:

    $keyword = 'AAA';
    $regex = "|keyword[:()/]? $keyword|";
    $content = "Some HTML content for example <b>keyword: AAA</b> and other HTML";
    $exp = "//b[php:functionString('preg_match', '$regex', .)]|//strong[php:functionString('preg_match', '$regex', .)]|//span[contains(@style, 'bold') and php:functionString('preg_match', '$regex', .)]";
    
    $doc = new DOMDocument();
    $doc->loadHTML($content);
    $xpath = new DOMXPath($doc);
    $xpath->registerNamespace('php', 'http://php.net/xpath');
    $xpath->registerPHPFunctions();
    $elements = $xpath->query($exp);