Search code examples
javavalidationsecurityowaspesapi

Can't seem to get ESAPI Validator getValidInput() Working for URL Parameters


I am trying to use ESAPI Encoder to identify and canonicalize URL-encoded query parameters. It sort of works, but not in the way the API seems to indicate. Here is my class, and below is the output it generates:

CODE

package test.test;

import org.owasp.esapi.ESAPI;
import org.owasp.esapi.Validator;
import org.owasp.esapi.errors.EncodingException;
import org.owasp.esapi.errors.IntrusionException;
import org.owasp.esapi.errors.ValidationException;

public class ESAPITester {

    public static void main(String argsp[]) throws ValidationException, 
    IntrusionException, EncodingException {

        String searchString = "-/+=_ !$*?@";
        String singleEncoded = ESAPI.encoder().encodeForURL(searchString);
        String doubleEncoded = ESAPI.encoder().encodeForURL(singleEncoded);
        Validator validator = ESAPI.validator();
        System.out.println("Searched        : " + searchString);
        System.out.println("Single encoded  : " + singleEncoded);
        System.out.println("Double encoded  : " + doubleEncoded);
        System.out.println("Decode from URL : " + ESAPI.encoder().decodeFromURL(singleEncoded));
        System.out.println("Canonicalized   : " + ESAPI.encoder().canonicalize(singleEncoded));
        System.out.println("Valid input     : " + validator.getValidInput("http", 
                searchString, "HTTPParameterValue", 100, true, true));
        System.out.println("Valid from Encoded : " + validator.getValidInput("http", 
                singleEncoded, "HTTPParameterValue", 100, true, true));

    }
}

OUTPUT

Searched        : -/+=_ !$*?@
Single encoded  : -%2F%2B%3D_+%21%24*%3F%40
Double encoded  : -%252F%252B%253D_%2B%2521%2524*%253F%2540
Decode from URL : -/ =_ !$*?@
Canonicalized   : -/+=_+!$*?@
Valid input     : -/+=_ !$*?@
log4j:WARN No appenders could be found for logger (IntrusionDetector).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" org.owasp.esapi.errors.ValidationException: http: Invalid input. Please conform to regex ^[\p{L}\p{N}.\-/+=_ !$*?@]{0,1000}$ with a maximum length of 100
    at org.owasp.esapi.reference.validation.StringValidationRule.checkWhitelist(StringValidationRule.java:144)
    at org.owasp.esapi.reference.validation.StringValidationRule.checkWhitelist(StringValidationRule.java:160)
    at org.owasp.esapi.reference.validation.StringValidationRule.getValid(StringValidationRule.java:284)
    at org.owasp.esapi.reference.DefaultValidator.getValidInput(DefaultValidator.java:214)
    at test.test.ESAPITester.main(ESAPITester.java:25)

My question is: Why does the getValidInput() not canonicalize the URL-encoded input parameter? I'm curious as to why the canonicalize() method does so, but getValidInput() with the final argument ('canonicalize') set to true doesn't.


Solution

  • So the question becomes:

    why the 2nd validator.getValidInput() call throws an exception, when all it is expected to do is to canonicalize the input and validate that it matches the expected value. In other words, the direct call to canonicalize() works, but the call to getValidInput() fails.

    Something is very wrong here. In the version of HTTPParameterValue that you get from the OWASP source repo, the regex is ^[a-zA-Z0-9.\\-\\/+=@_ ]*$ Someone has manipulated the HTTPParameterValue to look more like SafeString: ^[\\s\\p{L}\\p{N}.]{0,1024}$

    See line 440.

    This is wrong. Changing default ESAPI values shouldn't be done, if you need custom changes, write a brand new validator.properties entry using the established pattern.

    Your test will still fail however, because the string decodes to -/+=_ !$*?@ and ? is a reserved character within http queries.

    From an earlier spec:

    3.4. Query Component

    The query component is a string of information to be interpreted by the resource.

      query         = *uric
    

    Within a query component, the characters ";", "/", "?", ":", "@",
    "&", "=", "+", ",", and "$" are reserved.

    As to why the input fails according to the regex you're running at, ^[\\p{L}\\p{N}.\\-/+=_ !$*?@]{0,1000}$, read the code. At line 266 you'll see the affected method.

    Here's what you want to look at:

    public String getValid( String context, String input ) throws ValidationException
        {
            String data = null;
    
            // checks on input itself
    
            // check for empty/null
            if(checkEmpty(context, input) == null)
                return null;
    
            if (validateInputAndCanonical)
            {
                //first validate pre-canonicalized data
    
                // check length
                checkLength(context, input);
    
                // check whitelist patterns
                checkWhitelist(context, input);
    
                // check blacklist patterns
                checkBlacklist(context, input);
    
                // canonicalize
                data = encoder.canonicalize( input );
    
            } else {
    
                //skip canonicalization
                data = input;           
            }
    
            // check for empty/null
            if(checkEmpty(context, data, input) == null)
                return null;
    
            // check length
            checkLength(context, data, input);
    
            // check whitelist patterns
            checkWhitelist(context, data, input);
    
            // check blacklist patterns
            checkBlacklist(context, data, input);
    
            // validation passed
            return data;
    

    The regex gets checked before it even attempts to canonicalize your input.