I'm using the DOMDocument and DOMXPath to determine the presence of some phrase (Keyword phrase) in my HTML content, for example to search if the Keyword is in Bold. I use the follow code and works fine except that I need to "ignore" some characters when the keyword is searched. With the follow code:
$characters_to_ignore = array(':','(',')','/');
$keyword = 'keyword AAA';
$content = "Some HTML content for example <b>keyword: AAA</b> and other HTML";
$exp = '//b[contains(., "' . $keyword . '")]|//strong[contains(., "' . $keyword . '")]|//span[contains(@style, "bold") and contains(., "' . $keyword . '")]';
$doc = new DOMDocument();
$xpath = new DOMXPath($doc);
$elements = $xpath->query($exp);
I would need to identify "keyword: AAA" as well as "keyword AAA", so I need to specify to the DOMXPath query to ignore the characters in variable $characters_to_ignore when search for the keyword phrase.
The previous code works fine for "keyword AAA", how can I change it to match "keyword: AAA" too? (and with any of the characters in $characters_to_ignore)
New Information: Maybe using this?
but I can't get a working example.
Well, you probably already solved it somehow, but here's the solution...
It would be trivial using XPath 2.0 method matches()
, but PHP DOMXPath
class supports only XPath 1.0 yet.
But as of PHP 5.3, DOMXPath
class have the registerPHPFunctions() method which allow us to use PHP functions as XPath functions. :)
Making it work:
$keyword = 'AAA';
$regex = "|keyword[:()/]? $keyword|";
$content = "Some HTML content for example <b>keyword: AAA</b> and other HTML";
$exp = "//b[php:functionString('preg_match', '$regex', .)]|//strong[php:functionString('preg_match', '$regex', .)]|//span[contains(@style, 'bold') and php:functionString('preg_match', '$regex', .)]";
$doc = new DOMDocument();
$xpath = new DOMXPath($doc);
$xpath->registerNamespace('php', 'http://php.net/xpath');
$elements = $xpath->query($exp);