Search code examples
javamigrationcouchbasecouchbase-java-apinosql

Proper way to migrate documents in couchbase (API 1.4.x -> 2.0.x)


I would like to migrate documents persisted in couchbase via API 1.4.10 to new documents provided by API 2.0.5 like JsonDocument. I found that there is possibility to add custom transcoders to Bucket, so when decoding documents I can check for flags and decide which transcoder exactly should I use. But it seems to me that this is not quite good solution. Are there any other ways to do that in a proper way? Thanks.

Migration can be done only at runtime upon user request since there are too many documents, we can not migrate them all at once in the background.


Solution

  • You don't need to use a custom transcoder to read documents created with the 1.x SDK. Instead, use the LegacyDocument type to read (and write) documents in legacy format.

    More importantly, you shouldn't continue running with a mix of legacy and new documents in the database for very long. The LegacyDocument type is provided to facilitate the migration from the old format to the new SDK. The best practice in this case is to deploy an intermediate version of your application which attempts to read documents in one format, then falls back on trying to read them in the other. Legacy to new or vice versa, depending on which type of document is accessed more frequently at first. Once you have the intermediate version deployed, you should run a background task that will read and convert all documents from the old format to the new. This is pretty straightforward: you just try to read documents as LegacyDocument and, if it succeeds, you store the document right back as a JsonDocument using the CAS value you got earlier. If you can't read the document as legacy, then it's already in the new format. The task should be throttled enough that it doesn't cause a large increase in database load. After the task finishes, remove the fallback code from the application and just read and write everthing as JsonDocument.

    You mention having too many documents - how many is that? We've successfully migrated datasets with multiple billions of documents this way. This, admittedly, took several days to run. If you have a database that's larger than that, or has a very low resident ratio, it might not be practical to attempt to convert all documents.