SQL Query in Hive to get top 2 values in 2 columns

I have a table like access(url, access_time), with possibly many access times per url.

I have another table that is asset(url, foo)

I want to do a query that turns this into joined_data(url, first_access_time, second_access_time)

Where first_access_time is NULL if there's no access time, and second_access_time is NULL if there's no 2nd access time.

How could I do this in hive?

Solution

You can use row_number to do this.

with twotimes as (select ast.url, a.access_time,
                  row_number() over(partition by a.url order by a.access_time) as rn
                  from asset ast 
                  left join access a on a.url = ast.url )
select url, max(first_access_time), max(second_access_time)
from (
select url, access_time as first_access_time, null as second_access_time
from twotimes where rn = 1
union all
select url, null as first_access_time, access_time as second_access_time
from twotimes where rn = 2
) t
group by url