Search code examples
jsongoogle-bigqueryuser-defined-functionstalend

External UDF call in BigQuery through CLI


I am trying out querying the table using UDF in BigQuery[or through Talend] in my preferred JSON as output. I have gone through the link which explains usage of Inline and external UDFs. But I could not figure a way to execute UDFs from CLI.

Is it possible to execute the external UDF from CLI [bq or gsutil], which I can use it via Talend Data Integerator tool.Can anyone suggest any pointer to this?


Solution

  • You can run a UDF via the "bq" command line tool by specifying the --udf_resource flag. You can set the flag value to a gs:// URL or to the name of a local file.

    For example, you can run the urlDecode UDF from the UDF documentation as follows:

    $ cat urldecode.js
    // UDF definition
    function urlDecode(row, emit) {
      emit({title: decodeHelper(row.title),
            requests: row.num_requests});
    }
    
    // Helper function with error handling
    function decodeHelper(s) {
      try {
        return decodeURI(s);
      } catch (ex) {
        return s;
      }
    }
    
    // UDF registration
    bigquery.defineFunction(
      'urlDecode',  // Name used to call the function from SQL
    
      ['title', 'num_requests'],  // Input column names
    
      // JSON representation of the output schema
      [{name: 'title', type: 'string'},
       {name: 'requests', type: 'integer'}],
    
      urlDecode  // The function reference
    );
    
    $ cat query.sql
    SELECT requests, title
    FROM
      urlDecode(
        SELECT
          title, sum(requests) AS num_requests
        FROM
          [fh-bigquery:wikipedia.pagecounts_201504]
        WHERE language = 'fr'
        GROUP EACH BY title
      )
    WHERE title LIKE '%ç%'
    ORDER BY requests DESC
    LIMIT 100
    
    $ bq query --udf_resource=urldecode.js "$(cat query.sql)"