Search code examples
hivehqlunionhue

Union distinct in Hive


How do I combine rows while removing duplicate rows?

I tried UNION by itself as well as UNION DISTINCT but both returned error messages in Hue.

error while compiling statement: failed: parseexception line 5:10 mismatched input 'distinct' expecting all near 'union' in set operator

SELECT DISTINCT(product1.user)
FROM product1
UNION 
SELECT DISTINCT(product2.user)
FROM product2
UNION
SELECT DISTINCT(product3.user)
FROM product3

Solution

  • As answered by @Andrew in the comments above:

    If you are on a Hive version prior to 1.2, only union all is supported. So you'll have to use union all, and wrap an outer query around it for the distinct:

    select distinct user from (select user from product1 union all select user from product2...) t