Search code examples
sqlsql-servergroup-byhaving

Use of HAVING without GROUP BY not working as expected


I am starting to learn SQL Server, in the documentation found in msdn states like this

HAVING is typically used with a GROUP BY clause. When GROUP BY is not used, there is an implicit single, aggregated group.

This made me to think that we can use having without a groupBy clause, but when I am trying to make a query I am not able to use it.

I have a table like this

CREATE TABLE [dbo].[_abc]
(
    [wage] [int] NULL
) ON [PRIMARY]
GO

INSERT INTO [dbo].[_abc] (wage)
VALUES (4), (8), (15), (30), (50) 
GO

Now when I run this query, I get an error

select * 
from [dbo].[_abc]
having sum(wage) > 5

Error:

enter image description here


Solution

  • The documentation is correct; i.e. you could run this statement:

    select sum(wage) sum_of_all_wages
    , count(1) count_of_all_records
    from [dbo].[_abc] 
    having sum(wage) > 5
    

    The reason your statement doesn't work is because of the select *, which means select every columns' value. When there is no group by, all records are aggregated; i.e. you only get 1 record in your result set which has to represent every record. As such, you can only* include values provided by applying aggregate functions to your columns; not the columns themselves. * of course, you can also provide constants, so select 'x' constant, count(1) cnt from myTable would work.

    There aren't many use cases I can think of where you'd want to use having without a group by, but certainly it can be done as shown above.

    NB: If you wanted all rows where the wage was greater than 5, you'd use the where clause instead:

    select * 
    from [dbo].[_abc] 
    where wage > 5
    

    Equally, if you want the sum of all wages greater than 5 you can do this

    select sum(wage) sum_of_wage_over_5 
    from [dbo].[_abc] 
    where wage > 5
    

    Or if you wanted to compare the sum of wages over 5 with those under:

    select case when wage > 5 then 1 else 0 end wage_over_five
    , sum(wage) sum_of_wage
    from [dbo].[_abc] 
    group by case when wage > 5 then 1 else 0 end 
    

    See runnable examples here.


    Update based on comments:

    Do you need having to use aggregate functions?

    No. You can run select sum(wage) from [dbo].[_abc]. When an aggregate function is used without a group by clause, it's as if you're grouping by a constant; i.e. select sum(wage) from [dbo].[_abc] group by 1.

    The documentation merely means that whilst normally you'd have a having statement with a group by statement, it's OK to exclude the group by / in such cases the having statement, like the select statement, will treat your query as if you'd specified group by 1

    What's the point?

    It's hard to think of many good use cases, since you're only getting one row back and the having statement is a filter on that.

    One use case could be that you write code to monitor your licenses for some software; if you have less users than per-user-licenses all's good / you don't want to see the result since you don't care. If you have more users you want to know about it. E.g.

    declare @totalUserLicenses int = 100
    select count(1) NumberOfActiveUsers
    , @totalUserLicenses NumberOfLicenses
    , count(1) - @totalUserLicenses NumberOfAdditionalLicensesToPurchase
    from [dbo].[Users]
    where enabled = 1
    having count(1) > @totalUserLicenses 
    

    Isn't the select irrelevant to the having clause?

    Yes and no. Having is a filter on your aggregated data. Select says what columns/information to bring back. As such you have to ask "what would the result look like?" i.e. Given we've had to effectively apply group by 1 to make use of the having statement, how should SQL interpret select *? Since your table only has one column this would translate to select wage; but we have 5 rows, so 5 different values of wage, and only 1 row in the result to show this.

    I guess you could say "I want to return all rows if their sum is greater than 5; otherwise I don't want to return any rows". Were that your requirement it could be achieved a variety of ways; one of which would be:

    select *
    from [dbo].[_abc] 
    where exists 
    (
        select 1 
        from [dbo].[_abc] 
        having sum(wage) > 5
    ) 
    

    However, we have to write the code to meet the requirement, rather than expect the code to understand our intent.

    Another way to think about having is as being a where statement applied to a subquery. I.e. your original statement effectively reads:

    select wage
    from
    (
        select sum(wage) sum_of_wage
        from [dbo].[_abc]
        group by 1
    ) singleRowResult
    where sum_of_wage > 5
    

    That won't run because wage is not available to the outer query; only sum_of_wage is returned.