I have a regex that works perfectly with pcregrep
:
pcregrep -M '([a-zA-Z0-9_&*]+)(\(+)([a-zA-Z0-9_ &\*]+)(\)+)(\n)(\{)'
Now I tried to include this regex in my C++ code but it does not match (escapes included):
char const *regex = "([a-zA-Z0-9_&*]+)\\(+([a-zA-Z0-9_ &\\*]+)\\)+(?>\n+)\\{+";
re = pcre_compile(regex, PCRE_MULTILINE, &error, &erroffset, 0);
I'm trying to find function bodies like this (the paragraph is 0a
in hex):
my_function(char *str)
{
Why does it work with pcregrep
and not within the C++ code?
Your first regex:
( [a-zA-Z0-9_&*]+ ) # (1)
( \(+ ) # (2)
( [a-zA-Z0-9_ &\*]+ ) # (3)
( \)+ ) # (4)
( \n ) # (5)
( \{ ) # (6)
Your second regex:
( [a-zA-Z0-9_&*]+ ) # (1)
\(+
( [a-zA-Z0-9_ &\*]+ ) # (2)
\)+
(?> \n+ )
\{+
Other than different capture groups and an unnecessary atomic group (?>)
there is one thing that is obviously different:
The last newline and curly brace in the second regex have +
quantifiers.
But that's 1 or more, so I think the first regex would be a subset of the second.
The un-obvious difference is that it is unknown if the files were opened in translated mode or not.
You can usually cover all cases with \r?\n
in place of \n
.
(or even (?:\r?\n|\r)
).
So, if you want to quantify the linebreak, it would be (?:\r?\n)+
or (?:\r?\n|\r)+
.
The other option might be to try the linebreak construct (I think its \R
)
instead (available on the newest versions of pcre).
If that doesn't work, it's something else.