Search code examples
phpnestedpcrecodeblockscurly-braces

PCRE: Find matching brace for code block


Is there a way for PCRE regular expressions to count how many occurrences of a character it encounters (n), and to stop searching after it has found n occurrences of another character (specifically { and }).

This is to grab code blocks (which may or may not have code blocks nested inside them).

If it makes it simpler, the input will be a single-line string, with the only characters other than braces are digits, colons and commas. The input must pass the following criteria before code blocks are even attempted to be extracted:

$regex = '%^(\\d|\\:|\\{|\\}|,)*$%';

All braces will have a matching pair, and nested correctly.

I would like to know if this can be achieved before I start writing a script to check every character in the string and count each occurrence of a brace. Regular expressions would be much more memory friendly as these strings can be several kilobytes in size!

Thanks, mniz.

Solution

PCRE: Lazy and Greedy at the same time (Possessive Quantifiers)


Solution

  • pcre has recursive patterns, so you can do something like this

    $code_is_valid = preg_match('~^({ ( (?>[^{}]+) | (?1) )* })$~x', '{' . $code .'}');
    

    the other thing, i don't think this will be faster or less memory consuming than simple counter, especially on large strings.

    and this is how to find all (valid) codeblocks in a string

    preg_match_all('~ { ( (?>[^{}]+) | (?R) )* } ~x', $input, $blocks);
    print_r($blocks);