Search code examples
hadoopapache-sparkhivehiveqlshark-sql

Return Boolean (1 or 0) if table contains duplicate rows


I wish to return a boolean value if there are duplicates in the table in Hive 0.9 For now, I'm doing this :

select cast(case when count(*) > 0 then 1 else 0 end as smallint) Validate_Value 
from  (
    select guid, count(guid) cnt 
    from default.vms_feed v 
    group by guid
) where v.cnt > 1 ;

But this gives me an error:

"cannot recognize input near 'where' 'v' '.' in subquery source [DB Errorcode=11] "

I'm not sure where I'm going wrong or what I'm missing out here !


Solution

  • Correct syntax for Hive subquery is:

    SELECT ... FROM (subquery) name ...
    

    So in your case

    SELECT CAST(...) Validate_Value 
    FROM (
       ... 
    ) v WHERE v.cnt > 1 ;