Let's say we have a hive table with 4 different columns and I want to select from it for those values in the first column while make sure the values are different in the second column. Any help or guidance to how to do it?
---------------------
| C1 | C2 | C3 | C4 |
---------------------
| a 1 g. h |
| a 1 f. l |
| a 3 t. p |
| b 1 r. o |
| b 1 e. q |
| c 1 w. w |
| c 2 z. p |
-------------------
In the above example, I want to hive select return a
and c
because their values at C2 are different.
As I understand your question, you want c1
s that have more than one distinct value in c2
.
You can group by c1
, and use a having
clause with count(distinct)
to implement the filtering:
select c1
from mytable
group by c1
having count(distinct c2) > 1;