unicode encoding sql-injection httprequest httpmodule

How to determine the encoding of request query string

Suppose I have a .NET HttpModule that analyzes incoming requests to check for possible attacks like Sql Injection. Now suppose that a user of my application enters the following in a form field and submits it:

&#039&#032&#079&#082&#032&#049&#061&#049

That is Unicode for ' OR 1=1. So in the request I get something like:

http://example.com/?q=%26%23039%26%23032%26%23079%26%23082%26%23032%26%23049%26%23061%26%23049

Which in my HttpModule looks fine (no Sql Injection), but the server will correctly decode it to q=' OR 1=1 and my filter will fail.

So, my question is: Is there any way to know at that point what is the encoding used by the request query string, so I can decode it and detect the attack?

I guess the browser has to tell the server which encoding the request is in, so it can be correctly decoded. Or am I wrong?

Solution

the server will correctly decode it to q=' OR 1=1

It shouldn't. There is no valid reason(*) an application would HTML-decode the &#039... string before using it in an SQL query. HTML-decoding is a client-side occurrence.

(* there's the invalid reason: that the application author doesn't have the foggiest idea what they're doing, tries to write an input-HTML-escaping function - a misguided idea in the first place - and due to incompetence writes an input-de-escaping function instead... but that would be an unlikely case. Hopefully.)

Is there any way to know at that point what is the encoding used by the request query string

No. Some Web Application Firewalls attempt to get around this by applying every decoding scheme they can think of to the incoming data, and triggering if any of them match something suspicious, just in case the application happens to have an arbitrary decoder of that type sitting between the input and a vulnerable system.

This can result in a performance hit as well as increased false positives, and doubly so for the WAFs that try all possible combinations of two or more decoders. (eg is T1IrMQ a base-64-encoded, URL-encoded OR 1 SQL attack, or just a car numberplate?)

Quite how far you take this idea is a trade-off between how many potential attacks you catch and how much negative impact you have on real users of the app. There's no one 'correct' solution because ultimately you can never provide complete protection against app vulnerabilities in a layer outside the app (aka "WAFs don't work").