Search code examples
google-bigquerygoogle-cloud-datalab

Parameters in Datalab SQL modules


The parameterization example in the "SQL Parameters" IPython notebook in the datalab github repo (under datalab/tutorials/BigQuery/) shows how to change the value being tested for in a WHERE clause. Is it possible to use a parameter to change the name of a field being SELECT'd on?

eg: SELECT COUNT(DISTINCT $a) AS n FROM [...]

After I received the answer below, here is what I have done (with a dummy table name and field name, obviously):

%%sql --module test01 DEFINE QUERY get_counts SELECT $a AS a, COUNT(*) AS n FROM [project_id.dataset_id.table_id] GROUP BY a ORDER BY n DESC

table = bq.Table('project_id.dataset_id.table_id') field = table.schema['field_name'] bq.Query(test01.get_counts,a=field).sql bq.Query(test01.get_counts,a=field).results()


Solution

  • You can use a field from a Schema object (eg. given a table, get a specific field via table.schema[fieldname]).

    Or implement a custom object with a _repr_sql_ method. See: https://github.com/GoogleCloudPlatform/datalab/blob/master/sources/lib/api/gcp/bigquery/_schema.py#L49