I have recently started learning SQL, set up a server, tried to get some practice with a kaggle DB I have found.
I would like to see the recorder on each lap (meaning track name field should be distinct), however it lists all lap records instead.
I have tried:
select distinct
r.name, min(t.fastestlaptime) as 'Fastest lap', d.surname, d.driverId, r.year
from
races r
inner join
results t on r.raceID = t.raceID
inner join
drivers d on t.driverID = d.driverID
where
(t.fastestlaptime not in ('\N'))
group by
r.name, d.forename, d.surname, d.driverId, r.year
order by
r.name
I get this result:
name | Fastest lap | surname | driverId | year |
---|---|---|---|---|
70th Anniversary Grand Prix | 1:28.451 | Hamilton | 1 | 2020 |
70th Anniversary Grand Prix | 1:29.465 | Verstappen | 830 | 2020 |
70th Anniversary Grand Prix | 1:29.477 | Albon | 848 | 2020 |
Abu Dhabi Grand Prix | 1:26.103 | Verstappen | 830 | 2021 |
Abu Dhabi Grand Prix | 1:26.419 | Pérez | 815 | 2021 |
Abu Dhabi Grand Prix | 1:26.615 | Hamilton | 1 | 2021 |
Australian Grand Prix | 1:28.321 | Villeneuve | 35 | 2006 |
Australian Grand Prix | 1:28.336 | Pérez | 815 | 2017 |
Australian Grand Prix | 1:28.416 | Glock | 10 | 2009 |
ofc, I just copy pasted some of the results, there are 7.6k rows, instead of just showing the rows with unique track names.
Can someone please explain to me? I'm malding not understanding the concept :D
I would like to list distinct track names, and the lap record, name of the recorder, recorder ID and year next to it.
In your SQL distinct is redundant. You are grouping by "d.forename, d.surname, d.driverId, r.year" besided "r.name", thus you would get the min() per:
r.Name and then per each name, d.forename, d.surname, d.driverId, r.year as well. That means, for example say, Lando Norris didn't have the fastest on any tracks on any year but was on the grid, then you would also see all of Norris' fastest laps on all tracks on all years he was on the grid as well.
If what you want is to list who got the fastest lap on each track and the timing, each year then what you need is to get the min() per track, year and then get the other info of the driver. ie: (based on your current SQL without seeing actual data you have):
select fL.name, fL.FastestLap as [Fastest lap],
d.surname, d.driverId, fL.year
from
(select r.Name, r.raceId, r.Year, min(t.fastestlaptime) as FastestLap
from races r
inner join results t
on r.raceID=t.raceID
group by r.Name, r.raceId, r.Year) fL
inner join results t on t.RaceID = fL.RaceId
and t.fastestLapTime = fL.FastestLap
inner join drivers d on t.driverID = d.driverID
order by fL.name;
EDIT: Here is a sample with some sample data (I think data structures is odd, but anyway here it is):
SELECT fL.Name, fL.FastestLap AS [Fastest lap], d.driverName, d.driverId, fL.year
FROM(SELECT r.Name, r.raceId, r.year, MIN(t.fastestLaptime) AS FastestLap
FROM races r
INNER JOIN results t ON r.raceId=t.raceId
GROUP BY r.Name, r.raceId, r.year) fL
INNER JOIN results t ON t.raceId=fL.raceId AND t.fastestLaptime=fL.FastestLap
INNER JOIN drivers d ON t.DriverId=d.driverId
ORDER BY fL.Name, fL.year;
Name | Fastest lap | driverName | driverId | year |
---|---|---|---|---|
AbuDhabi Grand Prix | 00:01:16.9850000 | Driver 3 | 3 | 2022 |
AbuDhabi Grand Prix | 00:01:09.8800000 | Driver 4 | 4 | 2023 |
Brasilian Grand Prix | 00:01:23.0300000 | Driver 1 | 1 | 2022 |
Brasilian Grand Prix | 00:01:20.0000000 | Driver 2 | 2 | 2023 |
Monaco Grand Prix | 00:01:39.5000000 | Driver 2 | 2 | 2022 |
Monaco Grand Prix | 00:01:39.5000000 | Driver 3 | 3 | 2022 |
Monaco Grand Prix | 00:01:00.0010000 | Driver 1 | 1 | 2023 |