I have a requirement to produce a recordset shoings a combined view of google bigquery and datastore data. The recordset needs to be returned to a flask webapp on app-engine that formats it into a form of a report.
Conceptually I would like to build the following query;
select bq.column1, bg.column2, bq.column3,
ds.column4, ds.column5, ds.column6
from bigquery bq join datastore ds on bq.uniquekey = ds.uniquekey
where bq.column10 in ('some list of filters')
The bigquery data holds transactional information (and is partitioned and tuned) and the datastore holds the equivalent of dimensional data. The unique key is the id column in the datastore
I'm able to build queries against both data sources in google app-engine using python. I am trying to avoid writing a new table that has the new combined data, instead providing a real time view of the relationship.
There are about 25k entities in the datastore growing at < 100 entities per day and 5M+ entities in biquery growing at 10000 records per hour. I need to return the report to a user in under 5 seconds.
Is there a way to do this in realtime without saving a view of the data?
There is no way to do this directly. Instead what you'll probably want to do is to do the query on BigQuery, then if ds.uniquekey is actually the Datastore entity keys, then you can do lookups on each Datastore entity from the results return from your BigQuery query.