Search code examples
spring-bootvalidationsecurityxssesapi

Using ESAPI library and a regex pattern to filter XSS injection in Spring Boot - am I doing it right?


I'm currently working on a Spring Boot project that requires filtering user input to prevent XSS injection attacks. In my implementation, I'm using the ESAPI library with the utility ESAPI.encoder().canonicalize(request) to sanitize user input.

However, in addition to using ESAPI, I'm also using a regex pattern to filter out any potential XSS attacks. Here's the regex pattern I'm using:

private static final Pattern[] scriptPatterns = {
    Pattern.compile("<script>(.*?)</script>", Pattern.CASE_INSENSITIVE),
    Pattern.compile("src[\r\n]*=[\r\n]*\\\'(.*?)\\\'", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
    Pattern.compile("</script>", Pattern.CASE_INSENSITIVE),
    Pattern.compile("<script(.*?)>", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
    Pattern.compile("eval\\((.*?)\\)", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
    Pattern.compile("expression\\((.*?)\\)", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
    Pattern.compile("javascript:", Pattern.CASE_INSENSITIVE),
    Pattern.compile("vbscript:", Pattern.CASE_INSENSITIVE),
    Pattern.compile("onload(.*?)=", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL)
}

My question is, am I doing this right? Is it safe to use both ESAPI.encoder().canonicalize(request) and this regex pattern for filtering XSS injection attacks, or am I potentially leaving my application vulnerable? Should I be using additional utilities from the ESAPI library, or are there other alternatives I should consider? Any advice would be greatly appreciated.


Solution

  • What you are doing (the "HTML Sanitization" approach) will only get the low hanging fruit, but will miss most attacks. That approach is only effective when you are required to accept mark-up in small number of cases (e.g., accepting certain mark-up in a comment box for bold and italics), and when you do that, you would do an allow-list approach not a block-list approach like you seem to be constructing. (And, if you must do that, use something like OWASP AntiSamy or the OWASP Java HTML Sanitizer rather than constructing your own attempt.) However, the generally accepted best-practice approach is to use contextual output encoding using ESAPI's Encoder.

    The approach you are taking is a common mistake. So much so, that I added it to our ESAPI GitHub wiki page XSS Defense: No Silver Bullets. If you read it, if will provide you with some good references of how to approach XSS correctly as well as describing why the approach you are taking is insufficient.