Search code examples
google-cloud-datastoregoogle-app-engine-pythonapp-engine-flexible

Where does `gcloud.datastore` persist its local dev state, and how do I clear it?


I am experimenting with Google App Engine's flexible Python 3 environment and Cloud Datastore. When testing locally, this (generally) calls for running your app in something like Gunicorn and accessing the Datastore API from gcloud.datastore. For example:

import gcloud.datastore as g_datastore
ds = g_datastore.Client(...)
entity = datastore.Entity(key=ds.key(...))
ds.put(entity)

When run locally (in dev mode), Entities' states are persisted between runs. I can't for the life of me figure out where they are stored or how to clear the dev datastore that is created after creating/accessing gcloud.datastore.Client. As far as I can tell, it does not use the same place that ndb uses when run via dev_appserver.py.

I've tried to figure it out with something like this (when running OS X):

$ touch foo
$ GCLOUD_PROJECT=... python .../main.py
 * Running on http://127.0.0.1:8080/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger pin code: ...
127.0.0.1 - - [04/Jul/2016 10:36:01] "GET / HTTP/1.1" 200 -
...
^C
$ sudo find /private/tmp /var/db /var/tmp ~/.config/gcloud ~/Library -newer foo
...
# nothing meaningful

I tried looking at the source code, and found some unit test cleanup code that: a) isn't distributed with pip install gcloud; and (more important for me) b) doesn't give any clue as to where that stuff is actually stored.

I've even tried this while Gunicorn was running:

$ sudo lsof | grep -Ei 'python'
# nothing meaningful

Where the foo does gcloud.datastore store its state between runs when run locally (in dev mode)?!


Solution

  • Boy do I feel silly! 😖 By default, gcloud.datastore connects to ... (wait for it) ... the Google Cloud Datastore. The real one. I don't know why I expected any different.

    I didn't figure this out right away because my local gcloud configuration was already provisioned to use my account credentials, and I had the GCLOUD_PROJECT environment set when running my local instance. Whoops! 😳 (No wonder I wasn't seeing any changes on local disk!)

    So, if you want to have a "dev" Cloud Datastore running locally, you'll need to run the Datastore emulator. This is more complicated than running dev_appserver.py (which pretty much takes care of all this for you; see, e.g., this workflow for how to infer values for your index.yaml file from your app's Datastore calls). If you don't supply the --data-dir option to the start command, the default local storage location is ~/.config/gcloud/emulators/datastore/....

    Rather than delete the question, I'm leaving it here as a warning/explanation to numbskulls like myself.