Search code examples
c#.netserializationmembaseenyim

Need help on versioning/migrating data


I'm working on a project where I'll be using Membase (think Memcached + persistence) as our persistence layer with a multi-node cluster. We're using the Enyim client to talk to the cache and we're using binary serialization to serialize/deserialize the objects to and from the cache.

One of the concerns we have is how do we effectively manage changes to our data model, if we were working with normal SQL database we can run an update script to update your tables.

Using Membase and dealing with cached binary objects we COULD grab all the cached objects and load both binaries:

  1. version of code which was used to serialize the cached objects
  2. new version of code which defines different properties

and effectively migrate the data like this, but that's hardly desirable when we could potentially have tens of millions of objects in cache.. Ideally we'd like to be able to migrate the data only when it's necessary and have some iterative process we can run to migrate a version 1 data to version 2 and then 3 and so on but I struggle to think of a way to do this with binary data..

Just a shot in the dark, has anyone had any experience dealing with this kind of problems before? We're more than happy to use other forms of serialization and could simply store string (compressed maybe) data in the cache instead and handle the serialization ourselves.

Thanks,


Solution

  • Consider a repair on read paradigm where the new version of your library understands how to recognize V1 or V2 objects, uses an appropriate deserializer based on the version the object was stored as, but then reserializes V1 objects to V2 format after touching them.

    That way there's no need to batch update all of your objects, but you will eventually migrate all objects to V2 format. You can run a background process to slowly grab V1 objects and convert to V2 objects if needed to avoid the complexity of eventually having V1 through Vn to deal with in the repair on read algorithm.