Difference between HAVING and WHERE in SQL

I've seen in other questions that the difference between HAVING and WHERE in SQL is that HAVING is used post-aggregation whereas WHERE is used pre-aggregation. However, I am still unsure about when to use pre-aggregation filtering or post-aggregation filtering.

As a concrete example, why don't these two queries yield the same result (the second sums quantity prematurely in a way that squashes the GROUP BY call)?

Using WHERE to obtain number of condo sales of each real estate agent.

SELECT agentId, SUM(quantity) total_sales 
  FROM sales s, houses h
  WHERE s.houseId = h.houseId AND h.type = "condo"
  GROUP BY agentId
  ORDER BY total_sales;

Attempted use of HAVING to obtain the same quantity as above.

SELECT agentId, SUM(quantity) total_sales 
  FROM sales s, houses h
  GROUP BY agentId
  HAVING s.houseId = h.houseId AND h.type = "condo"
  ORDER BY total_sales;

Note: these were written/tested/executed in sqlite3.

Solution

The simple way to think about it is to consider the order in which the steps are applied.

Step 1: Where clause filters data

Step 2: Group by is implemented (SUM / MAX / MIN / ETC)

Step 3: Having clause filters the results

So in your 2 examples:

SELECT agentId, SUM(quantity) total_sales 
FROM sales s, houses h
WHERE s.houseId = h.houseId AND h.type = "condo"
GROUP BY agentId
ORDER BY total_sales;

Step 1: Filter by HouseId and Condo

Step 2: Add up the results (number of houses that match the houseid and condo)

SELECT agentId, SUM(quantity) total_sales 
FROM sales s, houses h
GROUP BY agentId
HAVING s.houseId = h.houseId AND h.type = "condo"
ORDER BY total_sales;

Step 1: No Filter

Step 2: Add up quantity of all houses

Step 3: Filter the results by houseid and condo.

Hopefully this clears up what is happening.

The easiest way to decide which you should use is: - Use WHERE to filter the data - Use HAVING to filter the results of an aggregation (SUM / MAX / MIN / ETC)