Search code examples
sqlsql-serverdistinctsql-server-2019

SQL DISTINCT MIN() - What am I doing wrong?


I have recently started learning SQL, set up a server, tried to get some practice with a kaggle DB I have found.

I would like to see the recorder on each lap (meaning track name field should be distinct), however it lists all lap records instead.

I have tried:

select distinct 
    r.name, min(t.fastestlaptime) as 'Fastest lap', d.surname,  d.driverId, r.year
from
    races r
inner join 
    results t on r.raceID = t.raceID
inner join 
    drivers d on t.driverID = d.driverID
where 
    (t.fastestlaptime not in ('\N'))
group by 
    r.name, d.forename, d.surname, d.driverId, r.year
order by 
    r.name

I get this result:

name Fastest lap surname driverId year
70th Anniversary Grand Prix 1:28.451 Hamilton 1 2020
70th Anniversary Grand Prix 1:29.465 Verstappen 830 2020
70th Anniversary Grand Prix 1:29.477 Albon 848 2020
Abu Dhabi Grand Prix 1:26.103 Verstappen 830 2021
Abu Dhabi Grand Prix 1:26.419 Pérez 815 2021
Abu Dhabi Grand Prix 1:26.615 Hamilton 1 2021
Australian Grand Prix 1:28.321 Villeneuve 35 2006
Australian Grand Prix 1:28.336 Pérez 815 2017
Australian Grand Prix 1:28.416 Glock 10 2009

1

ofc, I just copy pasted some of the results, there are 7.6k rows, instead of just showing the rows with unique track names.

Can someone please explain to me? I'm malding not understanding the concept :D

I would like to list distinct track names, and the lap record, name of the recorder, recorder ID and year next to it.


Solution

  • In your SQL distinct is redundant. You are grouping by "d.forename, d.surname, d.driverId, r.year" besided "r.name", thus you would get the min() per:

    r.Name and then per each name, d.forename, d.surname, d.driverId, r.year as well. That means, for example say, Lando Norris didn't have the fastest on any tracks on any year but was on the grid, then you would also see all of Norris' fastest laps on all tracks on all years he was on the grid as well.

    If what you want is to list who got the fastest lap on each track and the timing, each year then what you need is to get the min() per track, year and then get the other info of the driver. ie: (based on your current SQL without seeing actual data you have):

    select fL.name, fL.FastestLap as [Fastest lap], 
       d.surname,  d.driverId, fL.year
    from 
    (select r.Name, r.raceId, r.Year, min(t.fastestlaptime) as FastestLap
    from races r
    inner join results t
    on r.raceID=t.raceID
    group by r.Name, r.raceId, r.Year) fL
    inner join results t on t.RaceID = fL.RaceId 
        and t.fastestLapTime = fL.FastestLap
    inner join drivers d on t.driverID = d.driverID
    order by fL.name;
    

    EDIT: Here is a sample with some sample data (I think data structures is odd, but anyway here it is):

    SELECT fL.Name, fL.FastestLap AS [Fastest lap], d.driverName, d.driverId, fL.year
    FROM(SELECT r.Name, r.raceId, r.year, MIN(t.fastestLaptime) AS FastestLap
         FROM races r
              INNER JOIN results t ON r.raceId=t.raceId
         GROUP BY r.Name, r.raceId, r.year) fL
        INNER JOIN results t ON t.raceId=fL.raceId AND t.fastestLaptime=fL.FastestLap
        INNER JOIN drivers d ON t.DriverId=d.driverId
    ORDER BY fL.Name, fL.year;
    
    
    Name Fastest lap driverName driverId year
    AbuDhabi Grand Prix 00:01:16.9850000 Driver 3 3 2022
    AbuDhabi Grand Prix 00:01:09.8800000 Driver 4 4 2023
    Brasilian Grand Prix 00:01:23.0300000 Driver 1 1 2022
    Brasilian Grand Prix 00:01:20.0000000 Driver 2 2 2023
    Monaco Grand Prix 00:01:39.5000000 Driver 2 2 2022
    Monaco Grand Prix 00:01:39.5000000 Driver 3 3 2022
    Monaco Grand Prix 00:01:00.0010000 Driver 1 1 2023

    DBFiddle demo is here