Search code examples
mysqldbvisualizer

SQL Query sorting rows by duplicate name keeping lowest in result


I've got a table with 11 columns and I want to create a query that removes the rows with duplicate names in the Full Name's column but keeps the row with the lowest value in the Result's column. Currently I have this.

SELECT
    MIN(sql363686.Results2014.Result),
    sql363686.Results2014.Temp,
    sql363686.Results2014.Full Name,
    sql363686.Results2014.Province,
    sql363686.Results2014.BirthDate,
    sql363686.Results2014.Position,
    sql363686.Results2014.Location,
    sql363686.Results2014.Date   
FROM
    sql363686.Results2014
WHERE
    sql363686.Results2014.Event = '50m Freestyle'
AND sql363686.Results2014.Gender = 'M'
AND sql363686.Results2014.Agegroup = 'Junior'
GROUP BY
    sql363686.Results2014.Full Name
ORDER BY
    sql363686.Results2014.Result ASC ;

At first glance it seems to work fine and I get all the correct values, but I seem to be getting a different (wrong) value in the Position column then what I have in my database table. All other values seem to be right. Any ideas on what I'm doing wrong?

I'm currently using dbVisualizer connected to a mysql database. Also, my knowledge and experience with sql is the bare mimimum


Solution

  • You have fallen into the trap of the nonstandard MySQL extension to GROUP BY.

    (I'm not going to work with all those fully qualified column names; it's unnecessary and verbose.)

    I think you're looking for each swimmer's best time in a particular event, and you're trying to pull that from a so-called denormalized table. It looks like your table has these columns.

    Result
    Temp
    FullName
    Province
    BirthDate
    Position
    Location
    Date   
    Event
    Gender
    Agegroup
    

    So, the first step is to locate the best time in each event for each swimmer. To do this we need to make a couple of assumptions.

    1. A person is uniquely identified by FullName, BirthDate, and Gender.
    2. An event is uniquely identified by Event, Gender, Agegroup.

    This subquery will get the best time for each swimmer in each event.

    SELECT MIN(Result) BestResult,
           FullName,BirthDate, Gender,
           Event, Agegroup
      FROM Results2014
     GROUP BY FullName,BirthDate, Gender, Event, Agegroup
    

    This gets you a virtual table with each person's fastest result in each event (using the definitions of person and event mentioned earlier).

    Now the challenge is to go find out the circumstances of each person's best time. Those circumstances include Temp, Province, Position, Location, Date. We'll do that with a JOIN between the original table and our virtual table, like this

    SELECT resu.Event,
           resu.Gender,
           resu.Agegroup,
           resu.Result,
           resu.Temp.
           resu.FullName,
           resu.Province,
           resu.BirthDate,
           resu.Position,
           resu.Location,
           resu.Date
      FROM Results2014 resu
      JOIN (
                   SELECT MIN(Result) BestResult,
                          FullName,BirthDate, Gender,
                          Event, Agegroup
                     FROM Results2014
                    GROUP BY FullName,BirthDate, Gender, Event, Agegroup
           ) best   
               ON  resu.Result    = best.BestResult
              AND  resu.FullName  = best.FullName
              AND  resu.BirthDate = best.BirthDate
              AND  resu.Gender    = best.Gender
              AND  resu.Event     = best.Event
              AND  resu.Agegroup  = best.Agegroup
     ORDER BY resu.Agegroup, resu.Gender, resu.Event, resu.FullName, resu.BirthDate
    

    Do you see how this works? You need an aggregate query that pulls the best times. Then you need to use the column values in that aggregate query in the ON clause to go get the details of the best times from the detail table.

    If you want to report on just one event you can include an appropriate WHERE clause right before ORDER BY as follows.

     WHERE resu.Event = '50m Freestyle'
       AND resu.Gender = 'M'
       AND resu.Agegroup = 'Junior'