Search code examples
sqlhadoophivehiveql

“Hive” max column value from multiple columns


Hi: I have a situation where I need to find the max value on 3 calculated fields and store it in another field, is it possible to do it in one SQL query? Below is the example

SELECT Income1 ,
       Income1 * 2%  as Personal_Income ,
       Income2 ,
       Income2 * 10% as Share_Income ,
       Income3 ,
       Income3 * 1%  as Job_Income , 
       Max(Personal_Income, Share_Income, Job_Income ) 
  From Table

One way I tried is to calculate Personal_Income, Share_Income, Job_Income in the first pass and in the second pass I used

Select 
      Case when Personal_income > Share_Income and Personal_Income > Job_Income 
                then Personal_income 
           when Share_income > Job_Income 
                then Share_income 
           Else Job_income as the greatest_income

but this require me to do 2 scans on a billion rows table, How can I avoid this and do it in a single pass? Any help much appreciated.


Solution

  • As of Hive 1.1.0 you can use greatest() function. This query will do in a single table scan:

    select Income1 ,
           Personal_Income ,
           Income2 ,
           Share_Income ,
           Income3 ,
           Job_Income ,
           greatest(Personal_Income, Share_Income, Job_Income ) as greatest_income
    from
    (
    SELECT Income1 ,
           Income1 * 2%  as Personal_Income ,
           Income2 ,
           Income2 * 10% as Share_Income ,
           Income3 ,
           Income3 * 1%  as Job_Income , 
      From Table
    )s
    ;