I am trying to use ESAPI Encoder to identify and canonicalize URL-encoded query parameters. It sort of works, but not in the way the API seems to indicate. Here is my class, and below is the output it generates:
CODE
package test.test;
import org.owasp.esapi.ESAPI;
import org.owasp.esapi.Validator;
import org.owasp.esapi.errors.EncodingException;
import org.owasp.esapi.errors.IntrusionException;
import org.owasp.esapi.errors.ValidationException;
public class ESAPITester {
public static void main(String argsp[]) throws ValidationException,
IntrusionException, EncodingException {
String searchString = "-/+=_ !$*?@";
String singleEncoded = ESAPI.encoder().encodeForURL(searchString);
String doubleEncoded = ESAPI.encoder().encodeForURL(singleEncoded);
Validator validator = ESAPI.validator();
System.out.println("Searched : " + searchString);
System.out.println("Single encoded : " + singleEncoded);
System.out.println("Double encoded : " + doubleEncoded);
System.out.println("Decode from URL : " + ESAPI.encoder().decodeFromURL(singleEncoded));
System.out.println("Canonicalized : " + ESAPI.encoder().canonicalize(singleEncoded));
System.out.println("Valid input : " + validator.getValidInput("http",
searchString, "HTTPParameterValue", 100, true, true));
System.out.println("Valid from Encoded : " + validator.getValidInput("http",
singleEncoded, "HTTPParameterValue", 100, true, true));
}
}
OUTPUT
Searched : -/+=_ !$*?@
Single encoded : -%2F%2B%3D_+%21%24*%3F%40
Double encoded : -%252F%252B%253D_%2B%2521%2524*%253F%2540
Decode from URL : -/ =_ !$*?@
Canonicalized : -/+=_+!$*?@
Valid input : -/+=_ !$*?@
log4j:WARN No appenders could be found for logger (IntrusionDetector).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" org.owasp.esapi.errors.ValidationException: http: Invalid input. Please conform to regex ^[\p{L}\p{N}.\-/+=_ !$*?@]{0,1000}$ with a maximum length of 100
at org.owasp.esapi.reference.validation.StringValidationRule.checkWhitelist(StringValidationRule.java:144)
at org.owasp.esapi.reference.validation.StringValidationRule.checkWhitelist(StringValidationRule.java:160)
at org.owasp.esapi.reference.validation.StringValidationRule.getValid(StringValidationRule.java:284)
at org.owasp.esapi.reference.DefaultValidator.getValidInput(DefaultValidator.java:214)
at test.test.ESAPITester.main(ESAPITester.java:25)
My question is: Why does the getValidInput() not canonicalize the URL-encoded input parameter? I'm curious as to why the canonicalize() method does so, but getValidInput() with the final argument ('canonicalize') set to true doesn't.
So the question becomes:
why the 2nd validator.getValidInput() call throws an exception, when all it is expected to do is to canonicalize the input and validate that it matches the expected value. In other words, the direct call to canonicalize() works, but the call to getValidInput() fails.
Something is very wrong here. In the version of HTTPParameterValue
that you get from the OWASP source repo, the regex is ^[a-zA-Z0-9.\\-\\/+=@_ ]*$
Someone has manipulated the HTTPParameterValue
to look more like SafeString
: ^[\\s\\p{L}\\p{N}.]{0,1024}$
This is wrong. Changing default ESAPI values shouldn't be done, if you need custom changes, write a brand new validator.properties entry using the established pattern.
Your test will still fail however, because the string decodes to -/+=_ !$*?@
and ?
is a reserved character within http queries.
3.4. Query Component
The query component is a string of information to be interpreted by the resource.
query = *uric
Within a query component, the characters ";", "/", "?", ":", "@",
"&", "=", "+", ",", and "$" are reserved.
As to why the input fails according to the regex you're running at, ^[\\p{L}\\p{N}.\\-/+=_ !$*?@]{0,1000}$
, read the code. At line 266 you'll see the affected method.
Here's what you want to look at:
public String getValid( String context, String input ) throws ValidationException
{
String data = null;
// checks on input itself
// check for empty/null
if(checkEmpty(context, input) == null)
return null;
if (validateInputAndCanonical)
{
//first validate pre-canonicalized data
// check length
checkLength(context, input);
// check whitelist patterns
checkWhitelist(context, input);
// check blacklist patterns
checkBlacklist(context, input);
// canonicalize
data = encoder.canonicalize( input );
} else {
//skip canonicalization
data = input;
}
// check for empty/null
if(checkEmpty(context, data, input) == null)
return null;
// check length
checkLength(context, data, input);
// check whitelist patterns
checkWhitelist(context, data, input);
// check blacklist patterns
checkBlacklist(context, data, input);
// validation passed
return data;
The regex gets checked before it even attempts to canonicalize your input.