Search code examples
phpmemcachedigbinary

delimiting blocks of igbinary data


I am stashing chunks of log data in memcache to later throw into a database. On each request to the server I save an array of data using memcached::append(), using newlines to delimit the chunks. A simplified version would look like this:

$myCache->append('log', serialize($myArray)."\n");

Later when I want to build may query I pull all the rows out of the database and unserialize each one:

$dataToInsert = explode("\n", $myCache->get('log'));
$dataToInsert = array_map(function($row) {
    return unserialize($row);
}, $dataToInsert);

This works fine with the built-in serialize() and unserialize(), but I'd like to take advantage of igbinary's obvious strengths - size and speed. Unfortunately when I substitute the igbinary versions of the functions, I get errors.

It appears that the igbinary-serialized data can contain "\n" characters, so when I explode the stashed data it creates partial rows that of course fail.

Is there a delimiter that I can use besides newline to separate the blocks of igbinary data, or are igbinary and append() fundamentally incompatible?


Solution

  • Since igbinary stores binary data as-is, there is no guarantee of any character being available for use: you can serialize a string or integer containing any byte, any character.

    memcached supports adding, removing, and replacing data, and updating strings.

    Two ways to keep the logged data out of memory and in memcached until the SQL query come to mind:

    • use multiple keys: 'log1', ..., 'logN' and keep track of N.
    • reserve a character for yourself by escaping the binary output of the serialization (and unescaping before deserialization).

    The reservation could be done like this:

    str_replace( "\n", "\n1", $data ) . "\n0"
    

    This will make sure that every time there a \n in the output, it is followed by either a 0 or a 1.

    I'm not replacing \n with \n\n because this won't work well if $data starts or ends with \n.

    So:

    $myCache->append('log', str_replace("\n", "\n1", igbinary_serialize($myArray)."\n0");
    

    Splitting the data is then done using \n0, and the \n1 is unescaped back to \n:

    $dataToInsert = explode("\n0", $myCache->get('log'));
    $dataToInsert = array_map(function($row) {
        return igbinary_unserialize(str_replace("\n1", "\n", $row));
    }, $dataToInsert);