I have a table with an identifier column id
and another column with string values column_b
for which I would like to do customer ordering on column_b
. Let's say that column_b
consists of values A, B, C, D.
The syntax that can be used in Hive for row_number() over() is:
SELECT id, column_b, row_number() over(partition by id order by column_b) as row_id
FROM some_table
Example see here for an example
However, I would like to do custom ordering on column_b
instead of alphabetic ordering. The above syntax would yield something like:
Instead, I would like to explicitly order by column_b using the order A, C, D, B
, that is:
How can I achieve this?
Explicitly specify the order using case statement. You can use other scalar functions in the order by
:
SELECT id, column_b,
row_number() over(partition by id order by case column_b
when 'A' then '1'
when 'C' then '2'
when 'D' then '3'
when 'B' then '4'
--add more cases
--for example other values sort
--in natural order
else column_b
--or use constant
--to make sure
--everything else is greater than 4
--like this else concat('5',column_b)
end
) as row_id
FROM some_table
Also you can calculate order column in the subquery and use it in the window, it will work the same:
SELECT id, column_b,
row_number() over(partition by id order by orderby) as row_id
FROM (select t.*,
case column_b
when 'A' then '1'
when 'C' then '2'
when 'D' then '3'
when 'B' then '4'
else concat('5',column_b)
end orderby
from some_table t
) s