Is there a way to pass a comparison function to sort by a column in BigQuery? I am fine, if required, using a SQL or JS udf. This would be similar to something like Javascript's localeCompare
function, returning a -number, 0, or +number depending on if the first value is less than or equal to the second value.
This would allow doing something like:
const comparisonFunction = (val1, val2) => ...
MyValues.sort(comparisonFunction)
For example, if I have the following data:
with tbl as (
select "date" val union all
select "time" union all
select "number"
)
select * from tbl order by comparisonFunction(tbl.val)
And for arguments sake, let's say the comparison function is something like:
const RANK_MAP = {"number": 1, "time": 2, "date": 3}
function comparisonFunction(val1, val2) {
return RANK_MAP[val1] - RANK_MAP[val2];
}
Note that I want this signature (or something like it). I'm not looking for 'converting this into an inline sql function' with something like:
with tbl as (
select "date" val union all
select "time" union all
select "number"
)
select
tbl.val,
case tbl.val
when 'number' then 1
when 'time' then 2
when 'date' then 3
end rank
from tbl
order by rank
How could this be done with a udf function similar to the js approach above?
BigQuery introduced collations to enable sorting independantly of the lower or upper-case. However, the language specified sorting is not yet available.
For your example, you showed already the possibility to replace each string by a pseudo value and to sort by it. This solutions is good, because it scales in parallel.
If you want to use the JavaScript sort function, the UDF is capable of doing so. However, your dataset should not extend several hundreds rows. All strings are combined in the table helper
to an array. This array is sorted in the UDF. Then the sorted array is joined back to the main table and the offset of the array sort_by
is the column you want to sort the table by.
create temp function sortme(MyValues array<string>)
returns array<string>
language js as
"""
fct1 = (a,b) => a.localeCompare(b, 'en', { sensitivity: 'base' }) ;
const RANK_MAP = {"number": 1, "time": 2, "date": 3};
function fct2(val1, val2) {
return RANK_MAP[val1] - RANK_MAP[val2];
}
MyValues.sort(fct2)
return MyValues
""";
with tbl as (
select "date" val union all
select "time" union all
select "number"
),
helper as (
Select sortme(array_agg(distinct val)) sorted from tbl
)
select * from tbl
left join (select val, sort_by from helper,unnest(helper.sorted) as val with offset sort_by) using(val)
order by sort_by