Search code examples
google-app-engineapp-engine-ndbbulkloader

How to download all GAE datastore records?


I use GAE bulk loader to download datastore data -

appcfg.py download_data --log_file=bulkloader.log --kind=MyKind --application=s~myappid --url=http://myappid.appspot.com/rmt_api --filename=data_downloaded.csv --db_filename=skip --config_file=bulkloader.yaml

But after some time, I am getting OverQuotaError: The API call datastore_v3.RunQuery() required more quota than is available. This is about Datastore Read Operations. Looks like I should download some data on day 1, the something on day 2, day 3 etc.

How can I do it?

Upd. The doc says

If the transfer is interrupted, you can resume the transfer from where it left off using the --db_filename=... and --result_db_filename=... arguments. These arguments are the names of the progress file and the results file created by the tool, which are either names you provided with the arguments when you started the transfer, or default names that include a timestamp. This assumes you have sqlite3 installed, and did not disable progress files with --db_filename=skip.

Does it mean that I can run appcfg.py download_data ... several times passing the same db_filename and result_db_filename values and it will continue to download remaining records each time? What will happen with my CSV file? Will it add records at the end of file?


Solution

  • Usage of db_filename and result_db_filename allows to download data in a few days. Once daily limit is reached, downloading can be stopped and then started again on the next day - it will not download the same data again, but will continue to download remaining items. CSV file will be created once all data is downloaded (i.e. on the last day).