Search code examples
google-cloud-datastoreprojection

Projection Queries vs. Regular Queries


Reading the documentation on Cloud Datastore, it's pretty clear that a Get retrieves the entire entity, and that a query is always done against the index.

Here's where it gets confusing, because according to the documentation:

Projection queries allow you to query Cloud Datastore for just those specific properties of an entity that you actually need, at lower latency and cost than retrieving the entire entity

But if all queries are done against an index, a query would never retrieve the entire entity anyway as everything is taken from the index? If I have an index with 3 properties (A, B and C) there shouldn't be any difference in using a regular query or a projection query, because I can't "project" just A and B, as the index also contains C, so in this example I would have to "project" the same properties as the regular query?

It seems to me that projection queries can only really be used when an index doesn't contain all the properties on an entity? So if an entity doesn't have a lot of properties, it doesn't make any sense to use projection because most (if not all) queries will be using/getting all the properties anyway? I'm wondering if makes sense cost-wise to use projection in this case, because the documentation says that projection has lower latency and cost but would that apply to projection when grabbing all properties?


Solution

  • Per this article (probably the same article you're looking at), Projection queries require all properties specified in the projection to be included in a Cloud Datastore index.

    Whereas "regular" (which I'm taking to mean SELECT * ...) queries against Cloud Datastore typically use indexes that only contain a sorted subset of the properties of the queried entities, plus pointers to the full entities, projection queries run against indexes that contain all the fields requested by the query. So it appears the significant latency gain comes from the elimination of the need to fetch the queried entities once the set of entities matching the query has been discerned via the index.

    So when you write if all queries are done against an index, a query would never retrieve the entire entity anyway as everything is taken from the index?, that's not accurate: Non-projection queries are going to:

    1. Determine the index needed to efficiently fulfill the query (if the index doesn't exist, an exception will be raised)
    2. Use it to get pointers to the matching entities, and
    3. Fetch those matching entities

    As far as I can tell, projection queries are the only mechanism Cloud Datastore provides for fulfilling a query using only an index (without step #3 above).

    I haven't read any documentation suggesting that, even if you do configure an index containing all the properties across all entities of a particular datastore kind (which would be unusual), the query engine would be "smart" enough to use that index to fulfill queries without step #3 above even where technically possible. Indeed, since Cloud Datastore is schemaless and entities of the same kind can have different properties, even knowing whether or not an index contains all the properties for a given entity without fetching that entity would be a much more involved task than it would be with a database with schema.