Search code examples
regexapache.htaccessmod-rewriteurl-rewriting

Strip down any parameter from URL which is not whitelisted


I need to strip any parameter which is not in white list e.g. URL:

abc.com/somePage?phone=1234&stipAway=asd&fax=324&stripDown=disappear&zip=zip

should look:

abc.com/somePage?phone=1234&fax=324&zip=zip

Related question: Rewrite URL using .htaccess file in case there is used parameters which is not in white list

P.S. It is more complex solution needed that in first two answers mentioned. To make understandable I will illustrate with some more examples:

abc.com/somePage2?stripAway=asd&fax=324&stripDown=disappear&phone=1234&zip=zip

should look:

abc.com/somePage2?fax=324&phone=1234&zip=zip

and

abc.com/somePage3?stripAway=asd&stripDown=disappear

should look:

abc.com/somePage3

Solution

  • If the list is in order, anubhava's answer is much better than mine. If the list is not guaranteed to be in order, you can use the following rule. It uses the technique that Jon Lin uses in this answer.

    RewriteCond %{QUERY_STRING}           ^(.*&|)(?!(phone|fax|zip))[^=]+=
    RewriteCond ##%{QUERY_STRING}         (.*?)##(|.*&)(phone)=([^&]+)
    RewriteCond %1&%3=%4##%{QUERY_STRING} (.*?)##(|.*&)(fax)=([^&]+)
    RewriteCond %1&%3=%4##%{QUERY_STRING} (.*?)##(|.*&)(zip)=([^&]+)
    RewriteRule ^(.*)$ $1?%3=%4%1 [R,L]
    

    So, what does it do? The first condition is only true if it can find a parameter that does not match (phone|fax|zip) (negative look-ahead). The next three RewriteCond capture each of the parameters you want to keep and prepare a query string before the custom ## separator. Strange things will happen if that separator happens to be in your query string.

    The downside of this approach is, if one of these three parameters is not present, the rule will not be applied. I would personally put this whitelist in the page itself, and not try to filter it via .htaccess.


    Edit: If some parameters are optional, you can use the following monstrosity:

    RewriteCond %{QUERY_STRING} ^((.*?)&|)(?!(phone|fax|zip))[^=]+=[^&]+(&.*|)$
    RewriteRule ^(.*)$ $1?%2%4 [R,E=Redir:1]
    
    RewriteCond %{QUERY_STRING} ^&(.*)$
    RewriteRule ^(.*)$ $1?%1 [R,E=Redir:1]
    
    RewriteCond %{ENV:Redir} =1
    RewriteRule ^ - [R,L]
    

    Basically, if there is parameter in the string that does not match the parameters in the lookahead, it will match the part before that parameter and the part behind that parameter and reforge a query string. The second rule is to prevent it from devouring the entire query string if the first parameter is not 'whitelisted' (or it just simply starts with a &) and the last rule is to try to keep the amount of redirects to a minimum. Please note that if there are too many parameters in the request that are not whitelisted, the browser will display an error (redirect chain detected).


    Instead I would recommend filtering on the page itself. In php it would be something like the following code. This will make it easier to maintain this whitelist and doesn't break everything if you ever decide to use an other httpdeamon.

    <?php
    $whitelist = Array( 'phone', 'fax', 'zip' );
    foreach( $_GET as $k => $v ) {
      if( !in_array( $k, $whitelist ) )
        unset( $_GET[$k] );
    }
    
    #And the same for $_POST and $_REQUEST