Search code examples
google-bigquerygoogle-cloud-vertex-ai

Vertex BigQueryCreateModelJobOp `query_parameters` Format


The Python SDK for BigQuery components shows an option for passing query_parameters to the BigqueryCreateModelJobOp PipelineTask. This task shows that it expects a list. I can't find any example of how to specify parameters as a list here. For a specific example:

BigqueryCreateModelJobOp(
                    project="silly-demo-project",
                    location="country-region1",
                    query="SELECT * FROM {{ params.table }}",
                    query_parameters=["table=silly-demo-project.dataset.cooltable"]
                )

The above doesn't work - so what should query_parameters look like here?


Solution

  • The query_parameters argument is for BigQuery parametrized queries. Unfortunately there is no usage documentation, but vertex pipelines wrap normal sdk features by calling some python script over a shell with serialized payload as cmd argument.

    For each JobOp there is a respective Job class which serves as a calling client for a remote service in the cloud e.g BigqueryCreateModelJob. And these Job classes link to the documentation of the serialization format.

    In this case the query_parameters are part of JobConfigurationQuery. and must be passed as serialized string. The actual parameters are encoded by QueryParameter object

    The easiest way to find out the serialized string represenation of the parameter type is by spinning up a python shell or notebook and use the actual python-bigquery sdk client.

    from google.cloud import bigquery
    s = bigquery.ScalarQueryParameterType("INT64")
    s.to_api_repr()
    >>> {'type': 'INT64'}
    a = bigquery.ArrayQueryParameterType(s)
    a.to_api_repr()
    >>> {'type': 'ARRAY', 'arrayType': {'type': 'INT64'}}
    

    The ConcatPlaceholder from BigQuery pipeilne componets sdk takes care of converting simple python datatypes to string. Hence your pipeline can use this component like this:

    query_parameters = [
        {
             "name": "table",
             "parameterType": {"type": "INT64"},
             "parameterValue": {"value": "silly-demo-project.dataset.cooltable"},
        }
    ]
    
    BigqueryCreateModelJobOp(
      ...
      query="SELECT * FROM @table",
      query_parameters=query_parameters,
    )