RegEx: Count chars

I'm writing a PHP-Script which searches for particular headlines inside a dokuWiki-document.

My current pattern looks like this:

$pattern = "/.*=+ ". $header ." =+([^=]+)/m";
preg_match($pattern, $art->text, $m);
if (!empty($m[1])) {
   $art->text = $m[1];
} else {
   $art->text = "";
}

A sample document:

====== TestHeader ======
Testtext

===== Header2 =====
Testtext2

==== Header3 ====
Testtext3

====== Header4 ======
Testtext4

When searching for TestHeader my result AS-IS is:

====== TestHeader ======
Testtext

I would wish that the pattern returns:

====== TestHeader ======
Testtext

===== Header2 =====
Testtext2

==== Header3 ====
Testtext3

Or in other words: I would like to match all headers which are surrounded by less = then the header I was searching for.

Is something like this possible with regular expressions?

Thanks in advance!

Solution

As I'm not a great PHP coder I don't know if there are any special PHP extensions to "normal" regexp's that allow for what you want. Other than that, regexps can't possibly solve your problem.

There is some information theory behind that, in case you are interested: regexps can only analyse so called "regular languages" (see the corresponding Wikipedia article). Without diving into theory too much, I can give you the intuition that regular expressions can't "count" things (at least not in the sense that they can compare two counts within the match). To restate the WP example: you can't find a string that has N a's followed by N b's no matter what N is.

Of course this is no mathematical proof that what you look for isn't possible, but it should give you a feeling about what regular expressions can and can't do. HTH