Search code examples
mysqlsqlhiveranking

SQL Query in Hive to get top 2 values in 2 columns


I have a table like access(url, access_time), with possibly many access times per url.

I have another table that is asset(url, foo)

I want to do a query that turns this into joined_data(url, first_access_time, second_access_time)

Where first_access_time is NULL if there's no access time, and second_access_time is NULL if there's no 2nd access time.

How could I do this in hive?


Solution

  • You can use row_number to do this.

    with twotimes as (select ast.url, a.access_time,
                      row_number() over(partition by a.url order by a.access_time) as rn
                      from asset ast 
                      left join access a on a.url = ast.url )
    select url, max(first_access_time), max(second_access_time)
    from (
    select url, access_time as first_access_time, null as second_access_time
    from twotimes where rn = 1
    union all
    select url, null as first_access_time, access_time as second_access_time
    from twotimes where rn = 2
    ) t
    group by url