Search code examples
sqlpostgresqlinner-joinwhere-clausegreatest-n-per-group

Select only those users who have the most visits to provided district


I have a query that selects users with the districts which they visited and visits count.

select users.id, places.district, count(users.id) as counts from users
  left join visits on users.id = visits.user_id
  inner join places on visits.place_id = places.id
group by users.id, places.district

I need to select only those users who have visited provided district the most. For example, I have a user with id 1 who visited district A one time and district B three times. If I provide district B as parameter, user 1 will be in select. If I want to select users from district A, user 1 will not be in select.


Solution

  • I think that's ranking, then filtering:

    select *
    from (
        select u.id, p.district, count(*) as cnt_visits,
            rank() over(partition by u.id order by count(*) desc)
        from users u
        inner join visits v on u.id = v.user_id
        inner join places p on p.id = v.place_id
        group by u.id, p.district
    ) t
    where rn = 1 and district = ?
    

    Note that you don't actually need table users to get this result. We could simplify the query as:

    select *
    from (
        select v.user_id, p.district, count(*) as cnt_visits,
            rank() over(partition by u.id order by count(*) desc)
        from visits v
        inner join places p on p.id = v.place_id
        group by v.user_id, p.district
    ) t
    where rn = 1 and district = ?
    

    This query handles top ties: if a user had the same, maximum number of visits in two different districts, both are taken into account. If you don't need that feature, then we can simplify the subquery with distinct on:

    select *
    from (
        select distinct on (v.user_id) v.user_id, p.district, count(*) as cnt_visits
        from visits v
        inner join places p on p.id = v.place_id
        group by v.user_id, p.district
        order by v.user_id, cnt_visits desc
    ) t
    where district = ?