I have a table abcd in Oracle DB
+-------------+----------+
| abcd.speed | abcd.ab |
+-------------+----------+
| 4.0 | 2 |
| 4.0 | 2 |
| 7.0 | 2 |
| 7.0 | 2 |
| 8.0 | 1 |
+-------------+----------+
And I'm using a query like this:
select min(speed) keep (dense_rank last order by abcd.ab NULLS FIRST) MOD from abcd;
I'm trying to convert the code to Hive, but it looks like keep
is not available in Hive.
Could you suggest an equivalent statement?
select -max(struct(ab,-speed)).col2 as mod
from abcd
;
+------+
| mod |
+------+
| 4.0 |
+------+
Let start by explaining min(speed) keep (dense_rank last order by abcd.ab NULLS FIRST)
:
Find the row(s) with the max value of ab
.
For this/those row(s), find the min value of speed
.
We are using 2 tricks here.
The 1st is based on the ability to get the max value of a struct.
max(struct(c1,c2,c3,...))
returns the same result as if you have sorted the structs by c1
, then by c2
, then by c3
etc. and then chose the last element.
The 2nd trick is to use -speed
(which is the same of -1*speed
).
Finding the max of -speed
and then taking the minus of that value (which gives us speed
), is the same of finding the min of speed
.
If we would have ordered the structs, it would have looked like this (since 2 is bigger than 1 and -4 is bigger than -7):
+----+-------+
| ab | speed |
+----+-------+
| 1 | -8.0 |
| 2 | -7.0 |
| 2 | -7.0 |
| 2 | -4.0 |
| 2 | -4.0 |
+----+-------+
The last struct in this case in struct(2,-4.0)
, therefore this is the result of the max
function.
The fields names for a struct are col1
, col2
, col3
etc., so
struct(2,-4.0).col2
is -4.0
. and preceding it with minus (which is the same as multiple it by -1) as in -struct(2,-4.0).col2
is 4.0
.