Search code examples
google-app-enginedatabase-migrationgoogle-cloud-datastore

Handling Schema Migrations in App Engine


I've updated several of the heavily-used NDB models in one of my App Engine applications by moving a few properties between them. As such, some models now contain data that previous versions of my application would look to find in different places (and fail in doing so). I have updated my handlers to make sure that this will not happen after I run a migration that will accompany my release.

However, I am concerned about what will happen while the schema is migrating. I've tested my migration locally with a few hundred entities, and the task took about 1 second (using deferred's). As I have well over 1000 entities on production, I imagine that this task will actually take a few seconds. In the meantime, I'm sure that users will experience service problems.

From other questions (such as this one), I've gathered that a good practice is to redirect users to a page that alerts them of planned downtime for maintenance. However, this isn't really an option for me -- we need to keep our uptime as high as possible.

Thus, my question is this: is there a way to keep my application online and still execute a potentially-lengthy migration? I was thinking about using App Engine's "Traffic Splitting" to keep users on the old app version while the new app version migrates, but this will still cause a service problem.


Solution

  • You could do 2 migrations (add + delete) instead of a single (move) one.

    A first migration just adds the properties to be moved into their new locations in the models.

    Update the code reading the properties to check the new locations first and, if the properties don't exist, fallback to the old locations. Queries will likely need to be doubled for the 2 locations and with logic added to identify and skip the duplicate results. Pretty much a combination of the new and the old code.

    Then you can update the code writing the properties to use the new locations. Entities created past this moment won't have the old locations anymore. If you want to play it even safer (able to rollback to older code versions) or if you want to keep your older app versions running for a little while longer you can make the writing code write in both old and new locations.

    Then run a one-time job copying the attribute contents from the old locations (if they exist) to the new locations. This ensures all properties are present in the new locations for all entities.

    At this moment you can drop all code accessing the old locations.

    This is the point of no return. Before the next step you need to stop any GAE instance running your older app code - they won't run properly anymore and you won't be able to rollback your code without another DB-updating job.

    Then run a one-time cleanup job removing the properties from the old locations for all entities - they're no longer accessed by the lastest version of the code.

    Finalize with a 2nd migration deleting the properties from the old location in the models.

    It might be a lot of work, but it should be possible to achieve zero downtime (for the latest app version) irrespective of the duration of the migrations.

    Update:

    A potentially even better migration algorithm (avoids problematic OR'd doubled queries) is described in How to rename a Datastore entity field but be able to retrieve records via old and new property names?