I have a table, that contains uuid and some other values for users.
Also, I have the following UDF function in BQ that calculates user_group by given uuid, based on MD5 and crc32 values:
function GetGroup(uuid) {...
}
function getUserGroup(r, emit) {
emit ({group: GetGroup(String(r.uuid)), uuid: r.uuid
});
}
bigquery.defineFunction(
'get_group',
['uuid'],
[{'name': 'group', 'type': 'string'}, {'name': 'uuid', 'type': 'string'}],
getUserGroup);
So, to extract the user value I need to do something like:
SELECT
group,
uuid
FROM
get_group(
SELECT
uuid
FROM
[MY_TABLE] )
This is not very useful, because I would like to extract the same user_group for uuid's in couple of different tables and I don't want to define function "get_group" every time I want to use it. Additionally, if I want all fields from the table I need to JOIN the result with original table or hardcode the table schema in UDF.
Is there any convenient way to use UDF similar to any built-in function in Big Query? For example:
SELECT
uuid,
get_group(uuid)
FROM
[ANY_TABLE_WITH_UUID_FIELD]
I think, the best option for you is to use Scalar User-Defined Functions introduced with BigQuery Standard SQL (see Enabling Standard SQL).
This gives you great extent of composability that you are looking for
Pay attention to Including external libraries feature. It allows you to reference your heavy code as an external code libraries, while keeping your inline code minimal / readable / etc.
CREATE TEMP FUNCTION get_group(uuid STRING)
RETURNS STRING
LANGUAGE js AS
"""
return get_group(uuid);
"""
OPTIONS (
library="gs://your-bucket/path/to/your-lib.js"
);
SELECT
uuid,
get_group(uuid)
FROM
[ANY_TABLE_WITH_UUID_FIELD]