Search code examples
mysqlsqlgreatest-n-per-group

MySQL: Greatest n per group with joins and conditions


Table Structure

I have a table similar to the following:

venues

The following table describes a list of businesses

id    name
50    Nando's
60    KFC

rewards

The table describes a number of rewards, the venue it corresponds to and the number of points needed to redeem the reward.

id    venue_id    name        points
1     50          5% off      10
2     50          10% off     20
3     50          11% off     30
4     50          15% off     40
5     50          20% off     50
6     50          30% off     50
7     60          30% off     70
8     60          60% off     100
9     60          65% off     120
10    60          70% off     130
11    60          80% off     140

points_data

The table describes the number of points remaining a user has for each venue.

venue_id    points_remaining
50           30
60          90

Note that this query is actually computed within SQL like so:

select * from (
  select venue_id, (total_points - points_redeemed) as points_remaining
  from (
         select venue_id, sum(total_points) as total_points, sum(points_redeemed) as points_redeemed
         from (
                (
                  select venue_id, sum(points) as total_points, 0 as points_redeemed
                  from check_ins
                  group by venue_id
                )
                UNION
                (
                  select venue_id, 0 as total_points, sum(points) as points_redeemed
                  from reward_redemptions rr
                    join rewards r on rr.reward_id = r.id
                  group by venue_id
                )
              ) a
         group by venue_id
       ) b
  GROUP BY venue_id
) points_data

but for this question you can probably just ignore that massive query and assume the table is just called points_data.

Desired Output

I want to get a single query that gets:

  • The top 2 rewards the user is eligible for each venue
  • The lowest 2 rewards the user is not yet eligible for for each venue

So for the above data, the output would be:

id    venue_id    name        points
2     50          10% off     20
3     50          11% off     30
4     50          15% off     40
5     50          20% off     50
7     60          30% off     70
8     60          60% off     100
9     60          65% off     120

What I got so far

The best solution I found so far is first getting the points_data, and then using code (i.e. PHP) to dynamically write the following:

(
  select * from rewards
  where venue_id = 50
  and points > 30
  ORDER BY points desc
  LIMIT 2
)
union all
(
  select * from rewards
  where venue_id = 50
        and points <= 30
  ORDER BY points desc
  LIMIT 2
)
UNION ALL
(
  select * from rewards
  where venue_id = 60
        and points <= 90
  ORDER BY points desc
  LIMIT 2
)
UNION ALL
(
  select * from rewards
  where venue_id = 60
        and points > 90
  ORDER BY points desc
  LIMIT 2
)
ORDER BY venue_id, points asc;

However, I feel the query can get a bit too long and in-efficient. For example, if a user has points in 400 venues, that is 800 sub-queries.

I tried also doing a join like so, but can't really get better than:

select * from points_data
INNER JOIN rewards on rewards.venue_id = points_data.venue_id
where points > points_remaining;

which is far from what I want.


Solution

  • Correlated subqueries counting the number of higher or lower rewards to determine the top or bottom entries are one way.

    SELECT r1.*
           FROM rewards r1
                INNER JOIN points_data pd1
                           ON pd1.venue_id = r1.venue_id
           WHERE r1.points <= pd1.points_remaining
                 AND (SELECT count(*)
                             FROM rewards r2
                             WHERE r2.venue_id = r1.venue_id
                                   AND r2.points <= pd1.points_remaining
                                   AND (r2.points > r1.points
                                         OR r2.points = r1.points
                                            AND r2.id > r1.id)) < 2
                  OR r1.points > pd1.points_remaining
                     AND (SELECT count(*)
                                 FROM rewards r2
                                 WHERE r2.venue_id = r1.venue_id
                                       AND r2.points > pd1.points_remaining
                                       AND (r2.points < r1.points
                                             OR r2.points = r1.points
                                                AND r2.id < r1.id)) < 2
           ORDER BY r1.venue_id,
                    r1.points;
    

    SQL Fiddle

    Since MySQL 8.0 a solution using the row_number() window function would be an alternative. But I suppose you are on a lower version.

    SELECT x.id,
           x.venue_id,
           x.name,
           x.points
           FROM (SELECT r.id,
                        r.venue_id,
                        r.name,
                        r.points,
                        pd.points_remaining,
                        row_number() OVER (PARTITION BY r.venue_id,
                                                        r.points <= pd.points_remaining
                                           ORDER BY r.points DESC) rntop,
                        row_number() OVER (PARTITION BY r.venue_id,
                                                        r.points > pd.points_remaining
                                           ORDER BY r.points ASC) rnbottom
                        FROM rewards r
                             INNER JOIN points_data pd
                                        ON pd.venue_id = r.venue_id) x
           WHERE x.points <= x.points_remaining
                 AND x.rntop <= 2
                  OR x.points > x.points_remaining
                     AND x.rnbottom <= 2
           ORDER BY x.venue_id,
                    x.points;
    

    db<>fiddle

    The tricky part is here to partition the set also into the subset where the points of the user are enough to redeem the reward and the one where the points aren't enough, per venue. But as in MySQL logical expressions evaluate to 0 or 1 (in non Boolean context), the respective expressions can be used for that.