I am currently looking for the best way to backup my Google App Engine's webapp datastore. From the reading I have been doing it seems like there are two different ways I can do this. I can either use the GAE's bulkloader by calling appcfg.py upload_data --application= --kind= --filename= OR I can go into my webapp's datastore admin section, select the entities I want to backup, and then click the "Backup Entities" button. Unless I am mistaken, the "Backup Entities" button will create a backup of my datastore in the blobstore while using the appcfg.py's download_data will create a local backup. The backups will be happening weekly/monthly and the primary reason will be in case one of the webapp's admins accidently deletes important data. I am not worried about Google losing data, so that should not be taken into consideration when reviewing my question.
So my question is: which of these two methods is the preferred method? Which of them is faster, more efficient, cheaper, etc.?
Thanks in advance for your comments/help/answers.
Here are some factors to consider along with the solution that I think handles it best:
Dev Time - Datastore Admin - To leverage Bulkloader, you'll need to write scripts, maintain backup servers, storage, etc.
Cost - Datastore Admin - YMMV but our backup of tens of millions of entities used <1% of the 1bil Task Queue quota. The cost for datastore read operations and storage will be specific to your application. But between the two options, the read operations should be the same and you're trading Outgoing Bandwidth ($0.12/GB) in Bulkloader for Blobstore storage ($0.0043/GB) with Datastore Admin.
Backup Duration - Datastore Admin - As you would expect, mapreduce shards writing data to Blobstore inside Google's network is much much faster than streaming the entity data out one at a time. A full backup of our data with Datastore Admin takes under 6 hours. With Bulkloader it takes over 3 days.
Backup Maintenance - Bulkloader (for now) - With Bulkloader and a server, you can create crons to regularly perform backups and backup maintenance. For example, we have a server in Rackspace that backs up our datastore every 3 days and keeps the last 2 backups. With Datastore Admin you have to manually perform the backup and delete stale backups, until an automated solution is published (Issue 7040). Even still, for once a month backups, the cost of doing it manually with Datastore Admin is so low that I'd recommend it.
Data Flexibility - Bulkloader - With bulkloader you can export all your data into human-readable csv files allowing you to pivot it in Excel, create a test dataset for your local development environment, or even move your operation to another app hosting service (ex: AWS) should you require it.
Precision Restore - Bulkloader - Bulkloader can handle restoring select entities (where you know exactly which entities you deleted or overwrote) and bulk restores. Datastore Admin can only do bulk restore for all the entities of a given Kind.
Bulk Restore - Datastore Admin - Datastore Admin minimizes very expensive writes by only updating changed entities. Sharding also makes this process much, much faster than a simple Bulkloader upload (though you could shard the csv backup data across many clients yourself).
Ultimately Bulkloader gives you more precise control while Datastore Admin simplifies and speeds up bulk backup/restore. Even though Datastore Admin is new and has a few issues (7076), given your situation, I'd definitely recommend it.