Search code examples
phpuniquezend-cache

Generating a unique id for a given string using php


I'm using Zend_Cache_Core with Zend_Cache_Backend_File to cache results of queries executed for a model class that accesses the database.

Basically the queries themselves should form the id by which to cache the obtained results, only problem is, they are too long. Zend_Cache_Backend_File doesn't throw an exception, PHP doesn't complain but the cache file isn't created.

I've come up with a solution that is not efficient at all, storing any executed query along with an autoincrementing id in a separate file like so:

0->>SELECT * FROM table 1->>SELECT * FROM table1,table2 2->>SELECT * FROM table WHERE foo = bar

You get the idea; this way i have a unique id for every query. I clean out the cache whenever an insert, delete, or update is done.

Now i'm sure you see the potential bottleneck here, for any test, save or fetch from cache two (or three, where we need to add a new id) requests are made to the file system. This may even defeat the need to cache alltogether. So is there a way i can generate a unique id, ie a much shorter representation, of the queries in php without having to store them on the file system or in a database?


Solution

  • Strings are arbitrarily long, so obviously it's impossible to create a fixed-size identifier that can represent any arbitrary input string without duplication. However, for the purposes of caching, you can usually get away with a solution that's simple "good enough" and reduces collisions to an acceptable level.

    For example, you can simply use MD5, which will only produce a collision in 1 in 2128 cases. If you're still worried about collisions (and you probably should be, just to be safe) you can store the query and the result in the "value" of the cache, and check when you get the value back that it's actually the query you were looking for.

    As a quick example (my PHP is kind of rusty, but hopefully you get the idea):

    $query = "SELECT * FROM ...";
    
    $key = "hash-" + hash("md5", $query);
    $result = $cache->load($key);
    if ($result == null || $result[0] != $query) {
        // object wasn't in cache, do the real fetch and store it
        $result = $db->execute($query); // etc
    
        $result = array($query, $result);
        $cache->save($result, $key);
    }
    
    // the result is now in $result[1] (the original query is in $result[0])