Search code examples
zend-frameworkmod-rewritezend-cache

Is it possible to limit the number of $_GET parameters to a Zend Framework application using Zend_Cache_Backend_Static to cache static pages as HTML


I've just set up static page caching using Zend_Cache_Backend_Static to serve cached html files in my application, which is working great. The only concern I have is down to the way it caches files with $_GET parameters. Because it automatically creates a folder structure which maps to the supplied URL route, is this a potential security risk in cases where large numbers of $_GET parameters may be deliberately appended to existing pages? Either hitting a maximum directory depth or a maximum file length?

For example: At the moment I'm caching my pages into /public/cache/static/ so using the standard router /module/controller/action/param1/val1/param2/val2 or standard query string /module/controller/action?param1=val1&param2=val2 would create the following directory structures:

/public/cache/static/module/controller/action/param1/val1/param2/val2.html 
/public/cache/static/module/controller/action?param1=val1&param2=val2.html

Allowing people access to creating a directory structure in this way (however limited) worries me slightly. Both Zend_Cache_Backend_Static and the corresponding Zend_Cache_Frontend_Capture must both be set in the ini file not via Zend_Cache factory and don't appear to have any setup options.

Could it just be a case of replacing the default router with custom routes that limit the number of $_GET variables? Is this possible or would I need to specify exactly the variables I needed for each route (not the end of the world but a bit more limiting)

Update:

So the existing rewrite rule to handle the static cache is as follows:

RewriteCond %{REQUEST_METHOD} GET
RewriteCond %{DOCUMENT_ROOT}/cached/index.html -f
RewriteRule ^/*$ cached/index.html [L]

RewriteCond %{REQUEST_METHOD} GET
RewriteCond %{DOCUMENT_ROOT}/cached/%{REQUEST_URI}\.html -f
RewriteRule .* cached/%{REQUEST_URI}\.html [L]

RewriteCond %{REQUEST_FILENAME} -s [OR]
RewriteCond %{REQUEST_FILENAME} -l [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.*$ - [NC,L]

RewriteRule ^.*$ index.php [NC,L]

If the request hits a page in the static cache it will send that html page. If not it will hit Zend Framework and generate it.

I could add the following to the start:

RewriteCond %{QUERY_STRING} \S
RewriteRule [^\?]+ /$0? [R=301,L]

Which will wipe my query string altogether. This is fine as I can still pass $_GET variables in using the URL path method of Zend Framework (which I have also limited by providing very explicit routes). But is it possible to do this without redirecting?


Solution

  • OK, so the RewriteRule stripping the query string will work without a redirect.

    The issue (I suspect), is that Zend_Cache_Backend_Static is using $_SERVER['REQUEST_URI'] somewhere along the line and therefore getting access to the original filename. My knowledge of mod_rewrite is pretty slim and I didn't realise that this value wasn't altered.

    So, to prevent files and directories being created by massive query strings I've had to do the following things:

    Firstly for standard query strings:

    Strip the query string at the start of my mod_rewrite, without redirecting:

    RewriteCond %{QUERY_STRING} \S
    RewriteRule [^\?]+ /$0?
    

    In my index.php I'm then changing the $_SERVER['REQUEST_URI'] to match the redirect, by stripping the query string, which means I don't need to hack ZF any longer:

    $queryIndex = strpos($_SERVER['REQUEST_URI'], '?');
    if($queryIndex !== false) {
        $_SERVER['REQUEST_URI'] = substr($_SERVER['REQUEST_URI'], 0, $queryIndex);
    }
    

    This will now prevent ANY query string from being interpreted by my application. To pass variables to pages I am therefore using Zend Framework url path parameters. To prevent these from creating excessively deep cache folders, I've replaced the default route with a few very explicitly defined routes in the Bootstrap:

    $frontController = Zend_Controller_Front::getInstance(); 
    $router = $frontController->getRouter();
    
    $route = new Zend_Controller_Router_Route(
        ':module/:controller/:action',
        array(
            'module' => 'default',
            'controller' => 'index',
            'action' => 'index'
        )
    );
    
    $router->addRoute('default', $route);
    
    $route = new Zend_Controller_Router_Route(
        'article/:alias',
        array(
            'module' => 'default',
            'controller' => 'article',
            'action' => 'index',
            'alias' => ''
        )
    );
    
    $router->addRoute('article', $route);
    

    Here, I've replaced the default route so no additional parameters are allowed. Any actions which do require parameters therefore have to be explicitly set, for example in my second route. This means there could potentially be a lot of defined routes. Thankfully this is not the case in my particular application.

    A way around restricting the routes so much and allowing some GET params via ZF URL paths is to set a limit on the number of slashes in the REQUEST_URI, effectively limiting the max directory depth of the static page cache (10 below). This can also be altered in index.php:

    if(substr_count($_SERVER['REQUEST_URI'], '/') > 10) {
        preg_match_all("/\//", $_SERVER['REQUEST_URI'] ,$capture, PREG_OFFSET_CAPTURE);
        $_SERVER['REQUEST_URI'] = substr($_SERVER['REQUEST_URI'], 0, $capture[0][9][1]);
    }