Search code examples
google-bigqueryuser-defined-functionsudf

Using Big Query UDF during select


I have a table, that contains uuid and some other values for users.

Also, I have the following UDF function in BQ that calculates user_group by given uuid, based on MD5 and crc32 values:

function GetGroup(uuid) {...
}

function getUserGroup(r, emit) {
emit ({group: GetGroup(String(r.uuid)), uuid: r.uuid
  });
}

bigquery.defineFunction(
'get_group',
['uuid'],    
[{'name': 'group', 'type': 'string'}, {'name': 'uuid', 'type': 'string'}],
getUserGroup);

So, to extract the user value I need to do something like:

SELECT
  group,
  uuid
FROM
  get_group(
    SELECT
      uuid
    FROM
      [MY_TABLE] )

This is not very useful, because I would like to extract the same user_group for uuid's in couple of different tables and I don't want to define function "get_group" every time I want to use it. Additionally, if I want all fields from the table I need to JOIN the result with original table or hardcode the table schema in UDF.

Is there any convenient way to use UDF similar to any built-in function in Big Query? For example:

SELECT
  uuid,  
  get_group(uuid)
FROM
  [ANY_TABLE_WITH_UUID_FIELD]

Solution

  • I think, the best option for you is to use Scalar User-Defined Functions introduced with BigQuery Standard SQL (see Enabling Standard SQL).

    This gives you great extent of composability that you are looking for
    Pay attention to Including external libraries feature. It allows you to reference your heavy code as an external code libraries, while keeping your inline code minimal / readable / etc.

    CREATE TEMP FUNCTION get_group(uuid STRING) 
      RETURNS STRING 
      LANGUAGE js AS 
    """
      return get_group(uuid);
    """
    
    OPTIONS (
      library="gs://your-bucket/path/to/your-lib.js"
    );
    
    SELECT
      uuid,  
      get_group(uuid)
    FROM
      [ANY_TABLE_WITH_UUID_FIELD]