Search code examples
sqlipythonparameter-passinggoogle-bigquerygoogle-cloud-datalab

Passing parameters for TABLE_QUERY in Google Cloud Datalab iPython notebook


I'm still quite new to the Google Cloud Datalab and have some troubles executing parameterized queries.

I followed the example for passing query parameters from Datalab tutorial and tried to apply it on the following query:

%sql
SELECT user_id, localTime, event
FROM (SELECT user_id, DATE_ADD(date, timezoneOffset, "SECOND") AS localTime, event
  FROM (TABLE_QUERY([my_project:my_dataset:user_events], 
       'table_id CONTAINS "user_events_0" 
       AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
  WHERE 
  user_id IS NOT NULL AND
  timezoneOffset IS NOT NULL AND
  event IS NOT NULL)
WHERE 
  user_id IN (SELECT id FROM [my_project:my_dataset.topUsers])
ORDER BY user_id, localTime

I want to iterate over all user_events tables, with indexes 0,1,2,3 ... For that purpose, I'd like to pass the parameter for the TABLE_QUERY and query every table in one iteration of a loop - not all of the tables at the same time. (Because I need to order user records within each table; the resources are exceeded when executing a query over all user_events tables at once)

1.) I defined a new query (%%sql --module topUserEvents etc.) and replaced the following part from the query above:

 FROM (TABLE_QUERY([my_project:my_dataset:user_events], 
      'table_id CONTAINS "user_events_0" 
       AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))

with:

  FROM (TABLE_QUERY([my_project:my_dataset:user_events], 
       'table_id CONTAINS "user_events_'+$tableNr+ 
       '" AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))

Executed the query, passing table number as a string - didn't work:

invalidQuery: Expected a string literal for TABLE_QUERY clause

2.) I also tried passing the whole string, replacing part of the original query with:

  FROM (TABLE_QUERY([my_project:my_dataset:user_events], $tableString))

Executed the query, passing the whole string, returned Big Query Exception:

invalidQuery: Error preparing subsidiary query:
com.google.cloud.helix.server.bqsql.common.BigQueryException:
Encountered " "CONTAINS" "CONTAINS "" at line 1, column 94.
Was expecting:
")" ...

Does anyone know how to pass (a part of) a string for TABLE_QUERY parameter, such as in the case above?

Any help would be greatly appreciated :)


Solution

  • Can you try the following?

    Define module 'test1':

    %%sql --module test1
    SELECT count(*)
    FROM TABLE_QUERY(publicdata:samples, 
      'MSEC_TO_TIMESTAMP(creation_time) < DATE_ADD(CURRENT_TIMESTAMP(), -7, $period)')
    

    Run query:

    period = 'DAY'
    bq.Query(test1, period = period).sample()
    

    Define module 'test2':

    %sql --module test2
    SELECT user_id, localTime, event
    FROM (SELECT user_id, DATE_ADD(date, timezoneOffset, "SECOND") AS localTime, event
      FROM (TABLE_QUERY([my_project:my_dataset:user_events], 
           'table_id CONTAINS $events_table_num 
           AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
      WHERE 
      user_id IS NOT NULL AND
      timezoneOffset IS NOT NULL AND
      event IS NOT NULL)
    WHERE 
      user_id IN (SELECT id FROM [my_project:my_dataset.topUsers])
    ORDER BY user_id, localTime
    

    Run query:

    events_table_num = 'user_events_0'
    bq.Query(test2,events_table_num = events_table_num).sample()