Currently, I have a query that works fine:
SELECT teamID, yearID, AVG(HBP) as avgHBP FROM Batting \
GROUP BY teamID, yearID \
ORDER BY avgHBP DESC \
LIMIT 5;
RESULT:
| teamID | yearID | avgHBP |
|--------|--------|--------|
| BLN | 1898 | 6.67 |
| BL3 | 1891 | 6.16 |
| BLN | 1897 | 5.75 |
| BLN | 1896 | 5.45 |
| BLN | 1895 | 5.30 |
However, when I try to add a subquery (for no other reason than to try learning subqueries), I get this:
SELECT teamID, (select yearID from Batting) as yearID2, AVG(HBP) as avgHBP FROM Batting \
GROUP BY teamID, yearID2 \
ORDER BY avgHBP DESC \
LIMIT 5;
RESULT:
| teamID | yearID | avgHBP |
|--------|--------|--------|
| BLN | 1871 | 4.44 |
| CL5 | 1871 | 3.93 |
| BL3 | 1871 | 3.86 |
| BFP | 1871 | 3.69 |
| TL2 | 1871 | 3.55 |
where only the first year (1871) is considered for all columns.
Is this because the subquery is applied after the group by
? If I wanted to add a subquery, what would be the proper way to do it?
The subquery:
select yearID from Batting
returns more than 1 rows (actually it returns as many rows as there are in the table Batting
) and in any other rdbms, this query:
SELECT teamID, (select yearID from Batting) as yearID2, AVG(HBP) as avgHBP
FROM Batting
GROUP BY teamID, yearID2
is invalid.
But SQLite allows it and returns just 1 row, which is undefined.
In short your query is invalid SQL, although it does not throw an error (in SQLite only).
The results that you get by are the same as if you executed:
SELECT teamID, AVG(HBP) as avgHBP
FROM Batting
GROUP BY teamID
with 1 extra column which is the undefined returned yearID
of the subquery.
For your question:
If I wanted to add a subquery, what would be the proper way to do it?
the answer is that there is no general rule on how to add a subquery.
It depends on the requirement.
For example if you used this subquery:
select max(yearID) from Batting
which returns only 1 row, then you would have a valid query.