Search code examples
google-cloud-datastore

Are built-in index based ancestor queries efficient?


The indexes doc at https://cloud.google.com/datastore/docs/concepts/indexes says that built-in single property indexes can support

  • Queries using only ancestor and equality filters

  • Queries using only inequality filters (which are limited to a single property)

Since the built-in index for the property is sorted by the property value, I understand how it supports a single inequality filter. However, how is it able to support the equality filter with ancestor query? Say I have a million rows for the same property value, but the given ancestor condition only matches 100 rows within those million rows, would it have to scan all the million rows to find the 100 matching rows? I don't think that's the case as some where I read that Cloud Datastore scales with the number of rows in the result set and not the number of rows in the database. So, unless the single property index is internally a multi-column index with first column as the property and the second column as the entity key, I don't see how these ancestor + equality queries can be efficiently supported with built-in single property queries.


Solution

  • Cloud Datastore built-in indexes are always split into a prefix and a postfix at query time. The prefix portion is the part that remains the same (eg equalities or ancestors), the postfix portion is the part that changes (sort order).

    Builtin indexes are laid out:

    Kind, PropertyName, PropertyValue, Key

    For example, a query: FROM MyKind WHERE A > 1

    Would divide the prefix/postfix as:

    MyKind,A | range<1, inf>

    In the case you're asking about (ancestor with equality), FROM MyKind WHERE __key__ HAS ANCESTOR Key('MyAncestor', 1) AND A = 1 the first part of the prefix is easy:

    MyKind,A,1

    To understand the ancestor piece, we have to consider that Datastore keys are a hierarchy. In the case of MyKind, the keys might looks like: (MyAncestor, 1, MyKind, 345).

    This means we can make the prefix for an ancestor + equality query as:

    MyKind,A,1,(MyAncestor, 1)

    The postfix would then just be all the keys that have (MyAncestor,1) as a prefix and A=1.

    This is why you can have an equality with an ancestor using the built-in indexes, but not an inequality with an ancestor.

    If you're interested, the video Google I/O 2010 - Next gen queries dives into this in depth.