Search code examples
hadoophivehqldistinct

Select distinct on specific columns but select other columns also in hive


I have multiple columns in a table in hive having around 80 columns. I need to apply the distinct clause on some of the columns and get the first values from the other columns also. Below is the representation of what I am trying to achieve.

select distinct(col1,col2,col3),col5,col6,col7
from abc where col1 = 'something';

All the columns mentioned above are text columns. So I cannot apply group by and aggregate functions.


Solution

  • You can use row_number function to solve the problem.

    create table temp as
    select *, row_number() over (partition by col1,col2,col3) as rn
    from abc 
    where col1 = 'something';
    
    select *
    from temp
    where rn=1
    

    You can also sort the table while partitioning.

    row_number() over (partition by col1,col2,col3 order by col4 asc) as rn