I'm still quite new to the Google Cloud Datalab and have some troubles executing parameterized queries.
I followed the example for passing query parameters from Datalab tutorial and tried to apply it on the following query:
%sql
SELECT user_id, localTime, event
FROM (SELECT user_id, DATE_ADD(date, timezoneOffset, "SECOND") AS localTime, event
FROM (TABLE_QUERY([my_project:my_dataset:user_events],
'table_id CONTAINS "user_events_0"
AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
WHERE
user_id IS NOT NULL AND
timezoneOffset IS NOT NULL AND
event IS NOT NULL)
WHERE
user_id IN (SELECT id FROM [my_project:my_dataset.topUsers])
ORDER BY user_id, localTime
I want to iterate over all user_events tables, with indexes 0,1,2,3 ... For that purpose, I'd like to pass the parameter for the TABLE_QUERY and query every table in one iteration of a loop - not all of the tables at the same time. (Because I need to order user records within each table; the resources are exceeded when executing a query over all user_events tables at once)
1.) I defined a new query (%%sql --module topUserEvents
etc.) and replaced the following part from the query above:
FROM (TABLE_QUERY([my_project:my_dataset:user_events],
'table_id CONTAINS "user_events_0"
AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
with:
FROM (TABLE_QUERY([my_project:my_dataset:user_events],
'table_id CONTAINS "user_events_'+$tableNr+
'" AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
Executed the query, passing table number as a string - didn't work:
invalidQuery: Expected a string literal for TABLE_QUERY clause
2.) I also tried passing the whole string, replacing part of the original query with:
FROM (TABLE_QUERY([my_project:my_dataset:user_events], $tableString))
Executed the query, passing the whole string, returned Big Query Exception:
invalidQuery: Error preparing subsidiary query:
com.google.cloud.helix.server.bqsql.common.BigQueryException:
Encountered " "CONTAINS" "CONTAINS "" at line 1, column 94.
Was expecting:
")" ...
Does anyone know how to pass (a part of) a string for TABLE_QUERY parameter, such as in the case above?
Any help would be greatly appreciated :)
Can you try the following?
Define module 'test1':
%%sql --module test1
SELECT count(*)
FROM TABLE_QUERY(publicdata:samples,
'MSEC_TO_TIMESTAMP(creation_time) < DATE_ADD(CURRENT_TIMESTAMP(), -7, $period)')
Run query:
period = 'DAY'
bq.Query(test1, period = period).sample()
Define module 'test2':
%sql --module test2
SELECT user_id, localTime, event
FROM (SELECT user_id, DATE_ADD(date, timezoneOffset, "SECOND") AS localTime, event
FROM (TABLE_QUERY([my_project:my_dataset:user_events],
'table_id CONTAINS $events_table_num
AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
WHERE
user_id IS NOT NULL AND
timezoneOffset IS NOT NULL AND
event IS NOT NULL)
WHERE
user_id IN (SELECT id FROM [my_project:my_dataset.topUsers])
ORDER BY user_id, localTime
Run query:
events_table_num = 'user_events_0'
bq.Query(test2,events_table_num = events_table_num).sample()