Search code examples
hadoophivehiveqlcommon-table-expressionhue

How to use a CTE to update a table - Hive


I'm new to hive and trying to update a table (table_A) with a new column based on the values retrieved from another table (table_B). table_B is used as a lookup table for the categorical variable descriptions.

Ex: table_A

index field_x field_y
1 L S
2 H H
3 M S

Ex: table_B

variable_name variable_category variable_category_description
field_x L Low
field_x M Medium
field_x H High
field_y S Soft
field_y H Hard

Based on the table_A and table_B I need to obtain following table.

index field_x field_x_description field_y field_y_description
1 L Low S Soft
2 H High H Hard
3 M Medium S Soft

I tried following in hue hive editor just to add one column at the beginning. My editor does not support update statements.

CREATE TABLE table_C AS 
SELECT index,
    field_x, 
    field_y, 
(SELECT variable_category_description
    FROM table_B 
    where table_B.variable_name = 'field_x' and  table_B.variable_category= table_A.field_x
AS field_x_description
FROM table_A

I get following error

Error while compiling statement: FAILED: ParseException line 5:5 cannot recognize input near 'SELECT' 'variable_category_description' 'FROM' in expression specification

A fix for this is much appreciated.


Solution

  • You need two joins (INNER or LEFT) with table_B

    CREATE TABLE table_C AS 
    SELECT a.index,
           a.field_x, 
           x.variable_category_description AS field_x_description,
           a.field_y, 
           y.variable_category_description AS field_y_description
    FROM table_A
         LEFT JOIN table_B x ON x.variable_name  = 'field_x' 
                            and x.variable_category = a.field_x
    
         LEFT JOIN table_B y ON y.variable_name  = 'field_y' 
                            and y.variable_category = a.field_y
    ;