Search code examples
sqlpostgresqlsubquerywhere-clausecommon-table-expression

WHERE clause does not find column after a CTE?


New to CTE's and subqueries in SQL.
I have 3 tables:

categories (category_code, category)
countries (country_code, country, continent)
businesses (business, year_founded, category_code, country_code)

Goal is to look at oldest businesses in the world. I used a CTE:

WITH bus_cat_cont AS (
    SELECT business, year_founded, category, country,   
           continent
    FROM businesses AS b
    INNER JOIN categories AS c1
    ON b.category_code = c1.category_code
    INNER JOIN countries AS c2
    ON b.country_code = c2.country_code
    )
SELECT continent,
       category,
       COUNT(business) AS n
FROM bus_cat_cont
WHERE n > 5
GROUP BY continent, category
ORDER BY n DESC;

The code works without WHERE n > 5. But after adding that, I get the error:

column "n" does not exist

I realized there is a much easier way to get the output I want without a CTE.
But I'm wondering: Why do I get this error?


Solution

  • This would work:

    WITH bus_cat_cont AS (
       SELECT business, year_founded, category, country, continent
       FROM   businesses AS b
       JOIN   categories AS c1 ON b.category_code = c1.category_code
       JOIN   countries  AS c2 ON b.country_code = c2.country_code
       )
    SELECT continent, category, count(business) AS n
    FROM   bus_cat_cont
    -- WHERE  n > 5                -- wrong
    GROUP  BY continent, category
    HAVING count(business) > 5     -- right
    ORDER  BY n DESC;
    

    The output column name "n" is not visible (yet) in the WHERE or HAVING clause. Consider the sequence of events in an SQL query:

    For the record, the result has no obvious connection to your declared goal to "look at oldest businesses in the world". year_founded is unused in the query.
    You get the most common continent/category combinations among businesses.

    Aside, probably better:

    SELECT co.continent, ca.category, n
    FROM  (
       SELECT category_code, country_code, count(*) AS n
       FROM   businesses
       GROUP  BY 1, 2
       HAVING count(*) > 5
       ) b
    JOIN   categories ca USING (category_code)
    JOIN   countries  co USING (country_code)
    ORDER  BY n DESC;
    

    There is really no need for a CTE.

    Aggregate first, join later. See:

    Beside being faster, this is also safer. While category_code, country_code should be defined UNIQUE, the same may not be true for continent and category. (You may want to output codes additionally to disambiguate.)

    count(*) is implemented separately and slightly faster - and equivalent while business is defined NOT NULL.