Search code examples
hivehiveqlcloudera-manager

hive doesn't support merge function


trying to update the value from table to another table, both of these tables have the same field name but different values, the query must be work fine on any normal DB but here it returns

Error while compiling statement: FAILED: ParseException line 1:0 cannot recognize input near 'MERGE' 'INTO' 'FINAL'

MERGE 
INTO FINAL 
USING FIRST_STAGE
ON IMSI = FIRST_STAGE.IMSI and Site = FIRST_STAGE.Site
WHEN MATCHED THEN UPDATE SET
  Min_Date = least(FIRST_STAGE.Min_Date, Min_Date),
  Max_Date = greatest(FIRST_STAGE.Max_Date, Max_Date),
  NoofDays = FIRST_STAGE.NoofDays + NoofDays,
  Down_Link = FIRST_STAGE.Down_Link + Down_Link,
  up_Link = FIRST_STAGE.up_Link + up_Link,
  connection = FIRST_STAGE.connection + connection
WHEN NOT MATCHED THEN INSERT ( Min_Date, 
  Max_Date, 
  NoofDays, 
  IMSI, 
  Site, 
  Down_Link, 
  Up_Link, 
  Connection )
VALUES ( FIRST_STAGE.Min_Date, 
  FIRST_STAGE.Max_Date, 
  FIRST_STAGE.NoofDays, 
  FIRST_STAGE.IMSI, 
  FIRST_STAGE.Site, 
  FIRST_STAGE.Down_Link, 
  FIRST_STAGE.Up_Link, 
  FIRST_STAGE.Connection )

Solution

  • Hive merge statement is introduced in Hortonworks distribution.

    • Prerequisite for these merge statement to run is:

      Final table needs to be created with transactional enabled ,ORC format ,and bucketed.

    AFAIK In case of Cloudera distribution we need to use Kudu to perform upsert operations starting from cloudera-5.10+.

    Note: Upsert statement only works for Impala tables that use the Kudu storage engine.

    • I don't think we can run merge statements as mentioned in the post in CDH distributions as of now.