Search code examples
sqlselectgroup-byhivehaving-clause

Hive select a column based on a second column where the second column values are different


Let's say we have a hive table with 4 different columns and I want to select from it for those values in the first column while make sure the values are different in the second column. Any help or guidance to how to do it?

  ---------------------
  | C1 | C2 | C3 | C4 |
  ---------------------
  | a     1    g.   h |
  | a     1    f.   l |   
  | a     3    t.   p |  
  | b     1    r.   o |  
  | b     1    e.   q |
  | c     1    w.   w |
  | c     2    z.   p |
   -------------------

In the above example, I want to hive select return a and c because their values at C2 are different.


Solution

  • As I understand your question, you want c1s that have more than one distinct value in c2.

    You can group by c1, and use a having clause with count(distinct) to implement the filtering:

    select c1 
    from mytable
    group by c1
    having count(distinct c2) > 1;