I was looking into the google's re2
implementation, I want to do the regex match using regex given by some untrusted source. I have gone through the following resources -
With PCRE, we can get into some really bad cases and it can take exponential time to match. e.g. for regex (.*)+b
and string like aaaaaaaaa...
it can take O(c^n)
time where n
is length of string.
Some more DOS expressions documented here.
I understand that the re2
will do the match in linear time for a given regex expression against a given length of a string. Now, my question is if I am able to compile a regex in re2
, and I am putting a restriction on the length of a regex pattern then am I guaranteed that it won't run into some DOS cases mentioned in the above link or for some other regex pattern?
re2
has O(n) worst-case complexity for a given string of length n
, unlike some others regex implementations that support backtracking/look aheads.
So as long as both the length of string and length of regex pattern are capped, and you are validating the regex patterns beforehand, the re2
cannot be DOSed.
But while migrating some patterns from PCRE or otherwise using it, you need to make sure that the regex pattern can be compiled as the implementation doesn't support many PCRE functionalities like backreferences.