Search code examples
mysqlgreatest-n-per-groupwindow-functions

using FORCE INDEX to ensure the table is ordered with GROUP BY and ORDER BY before calculating user variables


I am trying to sum the nth highest rows.

I am calculating a cycling league table where 1st fastest rider at an event gets 50 points 2nd fastest 49 points and so on .... there are 10 events over the league but only a rider's 8 best results are used (this means a rider can miss up to 2 events without a catastrophic decent down the leader board) first i need a table where each rider's results from all events in the league are grouped together and listed in order of highest points, and then a sequential number calculated so i can sum the 8 or less best results. so i used this table select:

set @r := 0, @rn := 0 ;
SELECT 
t.*,
@rn := if(@r = t.id_rider, @rn + 1, 1) as seqnum,
@r :=  t.id_rider as dummy_rider
from results as t 
ORDER BY t.id_rider, t.points desc

where the table results is a view as below:

SELECT 
        a.id_rider,
        b.id_event,
        b.race_no,
        b.id_race,
        b.id_race_type,
        b.`position`,
        c.id_league,
        (51 - b.`position`) AS points
    FROM
    wp_dtk_start_sheet a
    JOIN wp_dtk_position_results b ON a.id_event = b.id_event AND a.race_no = b.race_no
    JOIN wp_dtk_league_races c ON b.id_race = c.id_race
    WHERE
    c.id_league = 1
    AND b.`position` IS NOT NULL

this does not work as the seqnum is 1 for all results. if i export the view table into excel and crate a test table with the same columns and data it works ok. i believe what is going wrong is that the table is not being sorted by ORDER BY t.id_rider, t.points desc before running through the variables

this reference: https://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/ states " This technique is pretty much non-deterministic, because it relies on things that you and I don’t get to control directly, such as which indexes MySQL decides to use for grouping"

this reference suggest trying to force the index to use id_rider so i tried:

set @r := 0, @rn := 0 ;
SELECT 
a.id_rider, 
c.id_league, 
(51- b.`position`) as points,
@rn := if(@r = a.id_rider, @rn + 1, 1) as seqnum,
@r :=  a.id_rider as 'set r'
from wp_dtk_start_sheet as a force index (id_rider)
join wp_dtk_position_results as b on a.id_event = b.id_event and a.race_no = b.race_no
join wp_dtk_league_races as c on b.id_race = c.id_race
where c.id_league = 1 and b.`position` is not null   
ORDER BY a.id_rider, points desc

this did not work i got seqnum =1 for all rows as before

my table structure is as below:

table a - wp_dtk_start_sheet

table a - wp_dtk_start_sheet

table b - wp_dtk_position_results

table b - wp_dtk_position_results

table c -wp_dtk_league_races

table c -wp_dtk_league_races

this stack overlow answer was also very helpfull but also has the same problem with it: Sum Top 10 Values

can anyone help? perhaps i am going about this all the wrong way?


Solution

  • The solution is much more clear if you use window functions. This allows you to specify the order of rows within each group for purposes of row-numbering.

    SELECT t.*
    FROM (
      SELECT *, ROW_NUMBER() OVER (PARTITION BY id_rider ORDER BY points DESC) AS seqnum
      FROM results
    ) AS t
    WHERE t.seqnum <= 8;
    

    Support for window functions in MySQL was introduced in version 8.0, so you might have to upgrade. But it's been part of the MySQL product since 2018.