Search code examples
oraclehivehiveql

How to implement Oracle's "func(...) keep (dense_rank ...)" In Hive


I have a table abcd in Oracle DB

  +-------------+----------+
  | abcd.speed  | abcd.ab  |
  +-------------+----------+
  | 4.0         | 2        |
  | 4.0         | 2        |
  | 7.0         | 2        |
  | 7.0         | 2        |
  | 8.0         | 1        |
  +-------------+----------+

And I'm using a query like this:

select min(speed) keep (dense_rank last order by abcd.ab NULLS FIRST) MOD from abcd;

I'm trying to convert the code to Hive, but it looks like keep is not available in Hive.
Could you suggest an equivalent statement?


Solution

  • select  -max(struct(ab,-speed)).col2 as mod
    from    abcd
    ;
    

    +------+
    | mod  |
    +------+
    | 4.0  |
    +------+
    

    Let start by explaining min(speed) keep (dense_rank last order by abcd.ab NULLS FIRST): Find the row(s) with the max value of ab.
    For this/those row(s), find the min value of speed.

    We are using 2 tricks here.
    The 1st is based on the ability to get the max value of a struct.
    max(struct(c1,c2,c3,...)) returns the same result as if you have sorted the structs by c1, then by c2, then by c3 etc. and then chose the last element.

    The 2nd trick is to use -speed (which is the same of -1*speed). Finding the max of -speed and then taking the minus of that value (which gives us speed), is the same of finding the min of speed.

    If we would have ordered the structs, it would have looked like this (since 2 is bigger than 1 and -4 is bigger than -7):

    +----+-------+
    | ab | speed |
    +----+-------+
    |  1 |  -8.0 |
    |  2 |  -7.0 |
    |  2 |  -7.0 |
    |  2 |  -4.0 |
    |  2 |  -4.0 |
    +----+-------+
    

    The last struct in this case in struct(2,-4.0), therefore this is the result of the max function.
    The fields names for a struct are col1, col2, col3 etc., so struct(2,-4.0).col2 is -4.0. and preceding it with minus (which is the same as multiple it by -1) as in -struct(2,-4.0).col2 is 4.0.