Search code examples
javaregexdenial-of-service

Regex vulnerable to polynomial runtime


How do I improve this code? SonarQube is highlighting that the regex pattern that could become really slow and produce denial of service. Here's the code:

  // Single quotes
  // Double quotes
  // Control characters (x01 - x1f)
  private static final String SQL_INJECTION_PATTERN = ".*['\"\\x01-\\x1f].*";

  if (myValue.matches(SQL_INJECTION_PATTERN)) { // Here's the problem!
    ...
  }

I need to find if any of those chars is present, to take an action.


Solution

  • Just ignore the warning. There aren't many regexes that have potentially exponential performance characteristics. If you do not let your users choose the regex (i.e. imagine you have a search feature where the user enters a regex or something you use as part of a regex in a search-for-this form field and they can submit it, then you'd be vulnerable here) and you don't try to apply such a regex that you wrote yourself to user input (trivially fixed by.. not having any such regexes in your code base)...

    it doesn't apply to you.

    But you do have a security leak here

    You're engaging in an act known as denylisting. An older term for this is 'blacklisting'. You list things that you know are bad, and scan for these. Any input that contains bad things is disallowed. The rest is allowed.

    Effectively you're saying: Unless I can show conclusively it is bad, we assume it is good.

    This is a broken approach to security.

    Allowlisting is the correct approach: Unless you can prove the input is safe, do not run it.

    Specifically, for SQL injection pattern, the solution is to let the SQL engine take care of it: Make PreparedStatements, and use .setX to set them - or use a library like JDBI or JOOQ that does this for you. Then there is simply no reason to worry about SQL injection whatsoever.

    Note that the actual security leak you do have is of the 'the hacker will p0wn the entire box by gaining admin creds on it' variety. Compared to 'users might be able to DOS your box', it's.. a few million times worse.


    EDIT: Here is an example of an allowlist approach in the same vein as the code you have (and the same caveat: Your security scanner may complain about a potential DOS attack venue here, you can ignore it, or.. see later paragraph):

    
    private static final String SQL_INJECTION_PATTERN = ".*[a-zA-Z0-9_ -].*";
    
      if (!myValue.matches(SQL_INJECTION_PATTERN)) throw new IllegalArgumentException();
    

    This code will check if the input consists solely of a very specific limited range of alphanums (just the ascii chars), and dash, and underscore, and space, and disallows everything that doesn't consist just of those things.

    This may still be unsafe - it depends on what you do with it. SQL has keywords, for example. If certain keywords are bad who knows - you'd need to elaborate considerably on what you're doing with user input here.

    The central point stands: It is highly likely your entire approach is wrong and you need to be using preparedstatements.

    One way to get rid of the warning is to stop using regexes. You can do this kind of scan just as easily with a for loop (and it is faster too, if anything):

    for (int i = 0; i < sql.length(); i++) {
      if ((i < 'a' || i > 'z') && (i < 'A' && i > 'Z') && (i < '0' || i > '9') && i != ' ' && i != '_' && i != ' ') throw new IllegalArgumentException();
    

    But, as a general rule, rewriting code solely because you're trying to work around an overly aggressive linting tool is an incredibly perniciously dangerous thing to do to a codebase. Programming is hard enough as is.