Sometimes you write a grouped query where each group is a single row, as having count(*) = 1
. This means that the usual aggregate functions like min
, max
, sum
and so on are a bit pointless: the min equals the max, equals the sum, equals the average. Since there's exactly one value to aggregate.
I usually end up picking min
arbitrarily. If we take the familiar example of a table mapping a book to its author(s), I might want to query just books that have a single author:
-- For books that have a single author, pull back that author's id.
select book_id,
min(author_id) as author_id
-- I could equally well use max(author_id) or even sum(author_id)...
from book_authors
group by book_id
having count(*) = 1
That works, but it seems it could be expressed better. I'm not actually interested in the 'minimum' per se, but just to get the single value which I know exists. Some column types (such as bit
in Microsoft SQL Server) do not support the min
aggregate function so you have to do workarounds like convert(bit, min(convert(int, mycol)))
.
So, I expect the answer will be no, but is there some better way to specify my intent?
select book_id,
there_must_be_one_value_so_just_return_it(author_id) as author_id
from book_author
group by book_id
having count(*) = 1
Clearly, if you're not requiring count(*)=1
then you no longer guarantee a single value and the special aggregate function could not be used. That error could be caught when the SQL is compiled.
The desired result would be equivalent to the min
query above.
I'm using Microsoft SQL Server (2016) but as this is a fairly "blue sky" kind of question, I would be interested in replies about other SQL dialects too.
You could, instead, use a windowed COUNT
and then filter based on that:
WITH CTE AS(
SELECT ba.book_id,
ba.author_id,
COUNT(ba.book_id) OVER (PARTITION BY ba.book_id) AS Authors
FROM dbo.book_authors ba)
SELECT c.book_id,
c.author_id
FROM CTE c
WHERE c.Authors = 1;
An alternative method would be to use a correlated subquery:
SELECT ba.book_id,
ba.author_id
FROM dbo.book_authors ba
WHERE EXISTS (SELECT 1
FROM dbo.book_authors e
WHERE e.book_id = ba.book_id
GROUP BY e.book_id
HAVING COUNT(*) = 1);
I have not tested performance on either with a decent amount of data, however, I would hope that for a correlated subquery with a well indexed table, you should see better performance.