Search code examples
regexxmlpcrerulessnort

Termination issue in pcre


actually I am building rules for my Snort IDS and trying to solve a problem with the Billion Laughs attack. It is nothing else than just recursive call of predefined variables. Snort rules may contain pcre and so i try to build an intelligent rule for this attack. This may be a simple form of this attack, with random lines inbetween the ENTITY-lines.

<!DOCTYPE data [
<!ENTITY a0 "dos" >
<!ENTITY a1 "&a0;&a0;&a0;&a0;">
<!ENTITY a2 "&a1;&a1;&a1;&a1;&a1;&a1;">
<!ENTITY a1 "&a2;&a2;&a2;&a2;&a2;&a2;">
test
<!ENTITY a1 "&a2;&a2;&a2;&ertertert;&a2;&a2;">
<!ENTITY a1 "&a2;&a2;&a2;&ertertert;&a2;&a2;">


<!ENTITY a1 "&a2;&a2;&a2;&a2;&a2;&a2;">
d
dd

<html abc>
a

<!ENTITY a2 "&a3;&a3;&a3;&a3;&a3;">
<!ENTITY a1 "&a0;&a0;&a0;&a0;&d5;">
]>
<data>&a2;</data>

And this is my actual rule:

(<!ENTITY\s[a-zA-Z0-9]*\s"(&[a-zA-Z0-9]+;){4,}">(\s?)[^]]*){5,}

To explain the goal that i want to achieve:

The rule has to trigger, whenever there are at least 5 ENTITY-lines with at least 4 of &-parameters. If all 5 lines are followed one after another, there is no problem, but the ENTITY-lines do not need to come one after another. So that i have to catch everything else in between two ENTITY-lines which makes the whole thing to a big termination problem, because [^]]* catches everything except a ] and also catches whole ENTITY-lines and makes my quantifier {5,} totaly useless. Actually i can't find any good solution for my problem.

Thanks for your help guys!


Solution

  • You may use

    (?s)<!ENTITY\s[a-z0-9]*\s"(&[a-zA-Z0-9]+;){4,}">(?:.*?<!ENTITY\s[a-z0-9]*\s"(&[a-zA-Z0-9]+;){4,}">){4,}
    

    See the regex demo

    Details

    • (?s) - DOTALL mode on, . now matches any chars
    • <!ENTITY - a literal <!ENTITY substring
    • \s - a whitespace
    • [a-z0-9]* - 0+ letters / digits
    • \s - a whitespace
    • " - a "
    • (&[a-zA-Z0-9]+;){4,} - 4 or more repetitions of &, 1+ alphanumeric chars and then ;
    • "> - a "> substring
    • (?: - start of a non-capturing group matching....
      • .*? - any 0+ chars, as few as possible
      • <!ENTITY\s[a-z0-9]*\s"(&[a-zA-Z0-9]+;){4,}"> - same pattern as above
    • ){4,} - ... 4 or more times.