I have multiple columns in a table in hive having around 80 columns. I need to apply the distinct clause on some of the columns and get the first values from the other columns also. Below is the representation of what I am trying to achieve.
select distinct(col1,col2,col3),col5,col6,col7
from abc where col1 = 'something';
All the columns mentioned above are text columns. So I cannot apply group by and aggregate functions.
You can use row_number
function to solve the problem.
create table temp as
select *, row_number() over (partition by col1,col2,col3) as rn
from abc
where col1 = 'something';
select *
from temp
where rn=1
You can also sort the table while partitioning.
row_number() over (partition by col1,col2,col3 order by col4 asc) as rn