Search code examples
sqlsqliteaggregationhaving

Difference between HAVING and WHERE in SQL


I've seen in other questions that the difference between HAVING and WHERE in SQL is that HAVING is used post-aggregation whereas WHERE is used pre-aggregation. However, I am still unsure about when to use pre-aggregation filtering or post-aggregation filtering.

As a concrete example, why don't these two queries yield the same result (the second sums quantity prematurely in a way that squashes the GROUP BY call)?

Using WHERE to obtain number of condo sales of each real estate agent.

SELECT agentId, SUM(quantity) total_sales 
  FROM sales s, houses h
  WHERE s.houseId = h.houseId AND h.type = "condo"
  GROUP BY agentId
  ORDER BY total_sales;

Attempted use of HAVING to obtain the same quantity as above.

SELECT agentId, SUM(quantity) total_sales 
  FROM sales s, houses h
  GROUP BY agentId
  HAVING s.houseId = h.houseId AND h.type = "condo"
  ORDER BY total_sales;

Note: these were written/tested/executed in sqlite3.


Solution

  • The simple way to think about it is to consider the order in which the steps are applied.

    Step 1: Where clause filters data

    Step 2: Group by is implemented (SUM / MAX / MIN / ETC)

    Step 3: Having clause filters the results

    So in your 2 examples:

    SELECT agentId, SUM(quantity) total_sales 
    FROM sales s, houses h
    WHERE s.houseId = h.houseId AND h.type = "condo"
    GROUP BY agentId
    ORDER BY total_sales;
    

    Step 1: Filter by HouseId and Condo

    Step 2: Add up the results (number of houses that match the houseid and condo)

    SELECT agentId, SUM(quantity) total_sales 
    FROM sales s, houses h
    GROUP BY agentId
    HAVING s.houseId = h.houseId AND h.type = "condo"
    ORDER BY total_sales;
    

    Step 1: No Filter

    Step 2: Add up quantity of all houses

    Step 3: Filter the results by houseid and condo.

    Hopefully this clears up what is happening.

    The easiest way to decide which you should use is: - Use WHERE to filter the data - Use HAVING to filter the results of an aggregation (SUM / MAX / MIN / ETC)