Search code examples
sqlhivehiveqlwindow-functions

Customised ordering in Hive's row_number() over partition by order by window function


I have a table with an identifier column id and another column with string values column_b for which I would like to do customer ordering on column_b. Let's say that column_b consists of values A, B, C, D.

The syntax that can be used in Hive for row_number() over() is:

SELECT id, column_b, row_number() over(partition by id order by column_b) as row_id
FROM   some_table

Example see here for an example

However, I would like to do custom ordering on column_b instead of alphabetic ordering. The above syntax would yield something like:

enter image description here

Instead, I would like to explicitly order by column_b using the order A, C, D, B, that is:

hive

How can I achieve this?


Solution

  • Explicitly specify the order using case statement. You can use other scalar functions in the order by:

    SELECT id, column_b, 
           row_number() over(partition by id order by case column_b
                                                           when 'A' then '1'
                                                           when 'C' then '2'
                                                           when 'D' then '3'
                                                           when 'B' then '4'
                                                           --add more cases
                                                           --for example other values sort
                                                           --in natural order  
                                                           else column_b 
                                                           --or use constant
                                                           --to make sure 
                                                           --everything else is greater than 4 
                                                           --like this  else concat('5',column_b)
                                                      end 
                             ) as row_id
      FROM some_table
    

    Also you can calculate order column in the subquery and use it in the window, it will work the same:

    SELECT id, column_b, 
           row_number() over(partition by id order by orderby) as row_id
      FROM (select t.*, 
                   case column_b
                        when 'A' then '1'
                        when 'C' then '2'
                        when 'D' then '3'
                        when 'B' then '4'  
                        else concat('5',column_b) 
                   end orderby
               from some_table t
            ) s