Search code examples
mysqlsqlmaxgreatest-n-per-group

Select rows with Max(Column Value) for each unique combination of two other columns


I'm working with a sample table like below. A Dataset has multiple groups, and each time a write to the table occurs, the RunNumber increments for the dataset, along with data for each group and the total. Each Dataset/Group combo will usually have multiple rows, example below:

RunNumber Group Dataset Total
1 Group1 Dataset A 10
1 Group1 Dataset A 20
2 Group1 Dataset A 30
2 Group2 Dataset A 15
1 Group1 Dataset B 5
1 Group2 Dataset B 10
1 Group3 Dataset A 30
2 Group3 Dataset A 30
1 Group1 Dataset C 15
1 Group2 Dataset C 50
2 Group2 Dataset C 70
2 Group2 Dataset C 90

What I want to do is essential for each combination of Dataset and Group, return all data for rows that have the max(RunNumber) for the given Dataset/Group combination. So for example, the above sample would return this:

RunNumber Group Dataset Total
2 Group1 Dataset A 30
2 Group2 Dataset A 15
1 Group1 Dataset B 5
1 Group2 Dataset B 10
2 Group3 Dataset A 30
1 Group1 Dataset C 15
2 Group2 Dataset C 70
2 Group2 Dataset C 90

Where the Dataset/Groups match, all rows are kept with the max RunNumber for that given combo. For now, I've split this into 2 separate queries, where i first query for the max(RunNumber) for all distinct Dataset/Group combos, then do a select * for all matches. Any help would be appreciated, thanks in advance!


Solution

  • In MySQL 5.x you can use a sub-query.

    SELECT * 
    FROM your_table
    WHERE (`Group`, Dataset, RunNumber) IN (
        SELECT `Group`, Dataset, MAX(RunNumber) AS MaxRunNumber
        FROM your_table
        GROUP BY `Group`, Dataset
      );
    

    Test on db<>fiddle here

    Alternatives

    --
    -- LEFT JOIN on bigger
    --
    SELECT t.* 
    FROM your_table t
    LEFT JOIN your_table t2
      ON t2.`Group` = t.`Group`
     AND t2.Dataset = t.Dataset
     AND t2.RunNumber > t.RunNumber
    WHERE t2.RunNumber IS NULL
    ORDER BY t.`Group`, t.Dataset;
    
    --
    -- where NOT EXISTS on bigger
    --
    SELECT * 
    FROM your_table t
    WHERE NOT EXISTS (
      SELECT 1
      FROM your_table t2
      WHERE t2.`Group` = t.`Group`
        AND t2.Dataset = t.Dataset
        AND t2.RunNumber > t.RunNumber
    )
    ORDER BY `Group`, Dataset;
    
    --
    -- Emulating DENSE_RANK = 1 with variables
    -- Works also in 5.x
    --
    SELECT RunNumber, `Group`, Dataset, Total
    FROM
    (
      SELECT 
        @rnk:=IF(@ds=Dataset AND @grp=`Group`, IF(@run=RunNumber, @rnk, @rnk+1), 1) AS Rnk
      , @grp := `Group` as `Group`
      , @ds := Dataset as Dataset
      , @run := RunNumber as RunNumber
      , Total
      FROM your_table t
      CROSS JOIN (SELECT @grp:=null, @ds:=null, @run:=null, @rnk := 0) var
      ORDER BY `Group`, Dataset, RunNumber DESC
    ) q
    WHERE Rnk = 1
    ORDER BY `Group`, Dataset;
    
    --
    -- DENSE_RANK = 1
    -- MySql 8 and beyond. 
    --
    SELECT *
    FROM
    (
      SELECT *
      , DENSE_RANK() OVER (PARTITION BY `Group`, Dataset ORDER BY RunNumber DESC) AS rnk
      FROM your_table
    ) q
    WHERE rnk = 1
    ORDER BY `Group`, Dataset;