I'm trying to check syntax of input file, where are rules for my project.
I want to check if that would be right or not. So I have my regex
\s*.*\$\s*..*\$\s*\|}\s*.*\s*,*
Which finds this text:
sometimes $so$ |} hello,
life $good$ |} hello,
not $that$ |} hello
Now in python I'm using re.findall to find the correct text, join the found patterns and then I compare it with the length of starting text. But for some reason it doesn't work.
The code: rule_syntax_check = re.findall("\s*.*\$\s*..*\$\s*\|}\s*.*\s*,*", RULES, re.DOTALL)
For example that would lead to error:
sometimes $so$ |} hello,
life $good$ | } hello,
not $that$ |} hello
But it finds the second line too, so the number of characters is the same as the number of found characters by my findall
. Is there any other option, or what I'm missing?
The problem is exactly that you are using re.DOTALL
a.k.a S
flag. DOTALL
means that the dot matches even newlines; if you take it out, the match cannot span to a new line.
However a better solution would be to test each record separately; e.g., if they are separated by comma, you'd first split by ,
, then use re.match
to match a single rule against the regular expression. Note that re.match
is not anchored to the end of the string, so you need to add extra $
to make sure that match against the exact string is required (though it is not necessary here):
Something like:
rules_split = RULES.split(',')
for i in rules_split:
if not re.match(r'\s*.*\$\s*.+\$\s*\|}.*$')