Search code examples
phpglobal-variablesfilteringsanitizationrequest-uri

filter_input() $_SERVER["REQUEST_URI"] with FILTER_SANITIZE_URL


I'm filtering $_SERVER["REQUEST_URI"] such that:

$_request_uri = filter_input(INPUT_SERVER, 'REQUEST_URI', FILTER_SANITIZE_URL);

As explained in php.net:

FILTER_SANITIZE_URL

Remove all characters except letters, digits and $-_.+!*'(),{}|\^~[]`<>#%";/?:@&=.

However,

the browser sends this REQUEST_URI value urlencode'd and therefore it is not sanitized in this filter_input() function. Say the address is

http://www.example.com/abc/index.php?q=abc��123

and then the sanitized request url is

/abc/index.php?q=abc%EF%BF%BD%EF%BF%BD123

But it should be

/abc/index.php?q=abc123

It is possible urldecode($_SERVER["REQUEST_URI"]) and then using filter_var() we can get a sanitized value.

$_request_uri = filter_var(urldecode($_SERVER['REQUEST_URI']), FILTER_SANITIZE_URL);

I don't know why the last one seems to me "inelegant" and I'm looking for an elegant way, sanitizing $_SERVER["REQUEST_URI"].

Maybe, accessing a super global array directly ($_SERVER['REQUEST_URI']) while coding disturbs me, thus "inelegant".

Is there an elegant way?


Solution

  • I think you could use either mod_rewrite or apaches SetEnv directive to undecode the url server side. This would have the effect of changing the REQUEST_URI in apache and consequently the value of $_SERVER["REQUEST_URI"] in php.

    I dont like this solution, and you likely dont want to do this. The issues I see:

    • it does not allow for multiple get parameters which may have different validation rules.
    • it allows for arbitrary parameters.
    • it requires permissions which a user may not have and changes default server behavior.
    • mod_rewrite is seldom a good solution.

    A good solution which avoids the global is to call filter_input or filter_input_array on INPUT_GET (instead of INPUT_SERVER).

    $urlParameters = http_build_query(
        filter_input_array(
            INPUT_GET,
            FILTER_SANITIZE_URL
        )
    );
    
    $_request_uri = filter_input(INPUT_SERVER, 'SCRIPT_URL', FILTER_SANITIZE_URL). ($urlParameters ? "?{$urlParameters}" : "");
    print_r($_request_uri);
    

    A better solution would be to whitelist specific parameters and use specific rules for validation, and to use these parameters directly (avoiding setting and parsing $_request_uri)

    $_request_parameters = filter_input_array(
        INPUT_GET,
        array(
            'q' => FILTER_SANITIZE_URL,
        )
    );
    
    print_r($_request_parameters['q']);