google-cloud-firestore google-cloud-datastore

It appears that entities in datastore use their ancestor key as part of their location, can this be relied on?

Here is my test setup running in the Datastore emulator with gcloud beta emulators datastore start --no-store-on-disk

Using the NodeJS client the setup is as follows. Note that for the purpose of the example I am using simple Kind + Name combinations for the ancestor. I am aware the best practice document discourages monotonically generated custom names.

const namespace = 'test';
const datastore = new Datastore();
const entities: any[] = [];
const paths = [
    ['A', '1', 'Z', '1'],
    ['A', '2', 'Z', '1'],
    ['A', '3', 'Z', '1'],
    ['A', '4', 'Z', '1'],
];

for (const path of paths) {
    const key = datastore.key({ path, namespace });
    const data = {
      text: 'Lorem Ipsum',
      path: key.path.toString(),
    };

    entities.push({ key, data });
}

const transaction = datastore.transaction();
await transaction.run();

transaction.upsert(entities);
await transaction.commit();

// wait a second for things to persist.
await new Promise((resolve) => {
    setTimeout(() => resolve(), 1000);
});

// Note that `hasAncestor` is **NOT** provided for this query.
const query = datastore.createQuery(namespace, 'Z');

const results = await datastore.runQuery(query);
expect(results[0]).toHaveLength(1); // fails, got 4 records back

I would expect there to be only 1 result when querying for all Z kind entities if the ancestor path had no bearing on the entities lookup location. That is not the case in my test however, I get 4 results back. Note the path is correct among each entity returned from the query:

[
    {
        "path": "A,1,Z,1",
        "text": "Lorem Ipsum"
    },
    {
        "path": "A,2,Z,1",
        "text": "Lorem Ipsum"
    },
    {
        "path": "A,3,Z,1",
        "text": "Lorem Ipsum"
    },
    {
        "path": "A,4,Z,1",
        "text": "Lorem Ipsum"
    }
]

So I wanted to confirm this is indeed the correct behavior and not just an artifact of the emulator. If this is how things are supposed to work it would follow that it is maybe ok to look at doing a time series using a unix timestamps so long as the Kind + Name of the ancestor provides sufficient protection against collision. In this case a UUID would likely suffice so long as the process requesting the write is not writing at a scale that would cause a timestamp collision. In this example let's assume it's 1 process per UUID never more.

['A', '95a69d2f-adac-4da7-b1ab-134ca0e7a840', 'Z', '1000000005000']
['A', '95a69d2f-adac-4da7-b1ab-134ca0e7a840', 'Z', '1000000006000']
['A', '95a69d2f-adac-4da7-b1ab-134ca0e7a840', 'Z', '1000000007000']

Or is this still just a bad idea?

Solution

This is the correct behaviour an entity is keyed by the entire path of key, i.e. it includes all ancestor keys.

If you have a unique (per process) prefix then you would not need to worry about the monotonically increasing keys as the writes are actually spaced out in the key space by your prefix. Thus, this should be a scalable solution.