Search code examples
sqlsql-serveraggregategroup

SQL aggregate function when count(*)=1 so there can be only one value


Sometimes you write a grouped query where each group is a single row, as having count(*) = 1. This means that the usual aggregate functions like min, max, sum and so on are a bit pointless: the min equals the max, equals the sum, equals the average. Since there's exactly one value to aggregate.

I usually end up picking min arbitrarily. If we take the familiar example of a table mapping a book to its author(s), I might want to query just books that have a single author:

-- For books that have a single author, pull back that author's id.
select book_id,
       min(author_id) as author_id
       -- I could equally well use max(author_id) or even sum(author_id)...
from book_authors
group by book_id
having count(*) = 1

That works, but it seems it could be expressed better. I'm not actually interested in the 'minimum' per se, but just to get the single value which I know exists. Some column types (such as bit in Microsoft SQL Server) do not support the min aggregate function so you have to do workarounds like convert(bit, min(convert(int, mycol))).

So, I expect the answer will be no, but is there some better way to specify my intent?

select book_id,
       there_must_be_one_value_so_just_return_it(author_id) as author_id
from book_author
group by book_id
having count(*) = 1

Clearly, if you're not requiring count(*)=1 then you no longer guarantee a single value and the special aggregate function could not be used. That error could be caught when the SQL is compiled.

The desired result would be equivalent to the min query above.

I'm using Microsoft SQL Server (2016) but as this is a fairly "blue sky" kind of question, I would be interested in replies about other SQL dialects too.


Solution

  • You could, instead, use a windowed COUNT and then filter based on that:

    WITH CTE AS(
        SELECT ba.book_id,
               ba.author_id,
               COUNT(ba.book_id) OVER (PARTITION BY ba.book_id) AS Authors
        FROM dbo.book_authors ba)
    SELECT c.book_id,
           c.author_id
    FROM CTE c
    WHERE c.Authors = 1;
    

    An alternative method would be to use a correlated subquery:

    SELECT ba.book_id,
           ba.author_id
    FROM dbo.book_authors ba
    WHERE EXISTS (SELECT 1
                  FROM dbo.book_authors e
                  WHERE e.book_id = ba.book_id
                  GROUP BY e.book_id
                  HAVING COUNT(*) = 1);
    

    I have not tested performance on either with a decent amount of data, however, I would hope that for a correlated subquery with a well indexed table, you should see better performance.