Search code examples
sqlhadoophiveclouderahive-metastore

How to compare two columns with different data type groups


This is an extension of a question I posed yesterday: How to handle potential data loss when performing comparisons across data types in different groups

In HIVE, is it possible to perform comparisons between two columns that are in different data type groups inline within the SELECT clause? I need to first determine what the incoming meta data is for each column and then provide logic that picks what CAST to use.

CASE 
    WHEN  Column1 <=> Column2 THEN 0 -- Error occurs here if data types are in different data type groups (i.e., BIGINT <=> STRING)
    ELSE 1
END

If this is not possible, is there a workaround to retrieve the meta data and perform CAST'ing based on certain rules? For example, if:

  1. Column1 is BIGINT
  2. Column2 is STRING

Then CAST Column2 as BIGINT and then perform the comparison. The simplest solution possible is desired. I'm hoping to not have to make substantial code changes to the underlying system that is generating the SELECT statement.

Update 2: There is a back-end system that generates the SQOOP queries and in that system there is a function that modifies the SELECT clause to compare two columns. The problem we are running into is that the columns used to be implicitly converted on an earlier version of HIVE/Cloudera that we were running. Now that we are upgrading to a version that does not support implicit data type conversion, we need to find the simplest solution to continue supporting the data type conversion. Unfortunately, there is no meta data available within the back-end, so I am trying to hack it out in the SELECT clause


Solution

  • Cast implicitly is good idea:

    CASE 
        WHEN  Column1 <=> cast(Column2 as bigint) THEN 0 
        ELSE 1
    END
    

    If not possible to cast, the result of cast is NULL.