Search code examples
google-cloud-datastoregcloud-pythongoogle-cloud-python

Datastore: Is there plan to add GQLQuery support?


I am using gcloud-python library for a project which needs to serve following use case:

  • Get a batch of entities with a subset of its properties (projection)
  • gcloud.datastore.api.get_multi() provides me batch get but not projection
  • and gcloud.datastore.api.Query() provides me projection but not batch get (like a IN query)

AFAIK, GQLQuery provides both IN query(batch get) and projections. Is there a plan to support GQLQueries in gcloud-python library? OR, is there another way to get batching and projection in single request?


Solution

  • Currently there is no way to request a subset of an entities properties. When you have the list of keys that you need, you should use get_multi().

    Projection Query Background

    In Datastore, projection queries are simply index scans.

    For example, consider you are writing the query SELECT * FROM MyKind ORDER BY myFirstProp, mySecondProp. This query will execute against an index: Index(MyKind, myFirstProp, mySecondProp). This index may look something like:

    myFirstProp | mySecondProp | __key__
    ------------------------------------
    a             1              k1
    a             2              k2
    b             1              k3
    

    For each result in the index, Datastore then looks up the key associated with that index result. If you do a projection query where you project only myFirstProp or mySecondProp or both, Datastore can avoid doing the random access lookup to find the associated entity for each result. This is generally where you get the large performance gain from using projections -- not from the savings of transporting it over the network.

    Likewise, if you know the list of keys that you need, you can lookup the key directly -- there is no need to look in an index first.

    IN Operator

    In Python GQL (not in the similar Cloud Datastore GQL), there is the IN operator, which allows you to write a query that looks something like:

    SELECT * FROM MyKind WHERE myFirstProp IN ['a', 'b'].
    

    However, Datastore does not actually support this query natively. Inside the python client, this will get converted into disjunctive normal form:

    SELECT * FROM MyKind WHERE myFirstProp = 'a'
    UNION
    SELECT * FROM MyKind WHERE myFirstProp = 'b'
    

    This means for each value inside your IN, you'll be issuing a separate Datastore query.