How to concat one column by order after group by?

The dataset looks like this:

id	result	rank
001	pass	2
002	fail	3
001	fail	1
002	pass	1

What I want to do: group the dataset by id and concatenate the results in ascending order of rank column.

id	results
001	fail-pass
002	pass-fail

As the other column's order is involved, the concat_ws('-',collect_set(result))function cannot fulfill my thought.

Are there any built-in functions to help me achieve this, or writing a UDF seems the only solution?

Solution

In a subquery before collect_set, distribute by id and sort by id, rank. Dataset will be distributed between reducers by id and sorted by rank before aggregation. See comments in the code.

Demo:

with demo_dataset as ( --Use your table instead of this CTE
select stack(4,
'001' , 'pass', 2,
'002' , 'fail', 3,
'001' , 'fail', 1,
'002' , 'pass', 1
) as (id,result,rank)
)

select id, concat_ws('-',collect_set(result))
from
(
select t.* 
  from demo_dataset t
distribute by id   --Distribute by grouping column
sort by id, rank   --Sort in required order
) s
group by id

Result:

id  results
001 fail-pass
002 pass-fail

Now if you change SORT: sort by id, rank desc you will get results ordered differently