Search code examples
hadoophivesubquerybigdataderived-column

How to use derived columns in same hive table?


Could you please help me below query.

Suppose there is table employee and columns A , B and Date column.

I have to load data from table employee to another table emp with below transformation applied

Transformation in Employee table

  1. Absolute value of column A - (column name in emp wil be ABS_A)

  2. Absolute value of column B -(column name in emp wil be ABS_B)

  3. Find the sum(ABS_A) for a given Date column

4.Find the sum(ABS_b) for a given Date column

  1. Find sum(ABS_A)/sum(ABS_B) - column name will be Average.

So the Final table emp will have below columns

1.A

2.B

3.ABS_A

4.ABS_B

5.Average

How to handle such derived column in hive?

I tried below query but now working. could anyone guide me.

insert overwrite into emp
select 
A,
B,
ABS(A) as ABS_A,
ABS(B) as ABS_B,
sum(ABS_A) OVER PARTION BY DATE AS sum_OF_A,
sum(ABS_B) OVER PARTTION BY DATE AS sum_of_b,
avg(sum_of_A,sum_of_b) over partition by date as average
from employee

enter image description here

enter image description here


Solution

  • Hive does not support using derived columns in the same subquery level. Use subqueries or functions in place of column aliases.

    insert overwrite table emp 
    select A, B, ABS_A, ABS_B, sum_OF_A, sum_of_b, `date`, sum_OF_A/sum_of_b as average
    from      
    (
    select A, B, ABS(A) as ABS_A, ABS(B) as ABS_B, `date`,
           sum(ABS(A)) OVER (PARTTION BY DATE) AS sum_OF_A, 
           sum(ABS(B)) OVER (PARTTION BY DATE) AS sum_of_b
      from employee
    )s;