Search code examples
sqlsubquerycommon-table-expressionwindow-functions

Syntax error in WITH clause and checking a query


I have the following sql problem to solve: Table: Events, with the following columns: column name: business_id and type:int column name: event_type and type: varchar column name: occurences and type: int

(business_id, event_type) is the primary key of this table. Each row in the table logs the info that an event of some type occured at some business for a number of times.

Write an SQL query to find all active businesses, where an active business is a business that has more than one event type with occurences greater than the average occurences of that event type among all businesses.

The query result format is in the following example:

Events table: {"headers":{"Events":["business_id","event_type","occurences"]},"rows":{"Events":[[1,"reviews",7],[3,"reviews",3],[1,"ads",11],[2,"ads",7],[3,"ads",6],[1,"page views",3],[2,"page views",12]]}}

Result table for the above would have one column called business_id and it will have one row with 1.

Explanation: Average for 'reviews', 'ads' and 'page views' are (7+3)/2=5, (11+7+6)/3=8, (3+12)/2=7.5, respectively. Only business with id 1 has 7 'reviews' events (more than 5) and 11 'ads' events (more than 8) so it is the only active business.

I wrote the following query:

WITH firstTable (event_type,avg_o) as (SELECT Events.event_type, avg(occurences) as avg_o
FROM Events
GROUP BY event_type)

WITH secondTable (business_id,event_type,avg2,g) as (SELECT Events.business_id, Events.event_type,firstTable.avg_o,
CASE
    WHEN Events.occurences>firstTable.avg_o THEN 1
    ELSE 0
END as g
FROM Events
JOIN firstTable
ON Events.event_type = firstTable.event_type)

WITH thirdTable as (SELECT business_id, sum(g) as g2
FROM secondTable
GROUP BY business_id)

SELECT business_id
FROM thirdTable
WHERE g2>1;

Questions, please:

  1. There seems to be syntax errors in my query. Could you help me identify what they are and how to correct them?
  2. Does my query work for solving the problem?
  3. I believe there is a solution using window functions. Is that better, performance wise, than my approach that uses CTEs as you can see above?

Your help will be greatly appreciated. Thanks a lot!

Aly


Solution

  • Inspired by the excellent answer from GMB, here is a simpler CTE answer:

    WITH 
    firstTable (event_type,avg_o) as (SELECT Events.event_type, avg(occurences) as avg_o
    FROM Events
    GROUP BY Events.event_type)
    
    SELECT Events.business_id
    FROM Events
    JOIN firstTable
    ON Events.event_type=firstTable.event_type
    WHERE occurences>avg_o
    GROUP BY Events.business_id
    Having count(*)>1;
    

    And another simpler sub-query answer:

    SELECT Events.business_id
    FROM Events
    JOIN (SELECT Events.event_type, avg(occurences) as avg_o
    FROM Events
    GROUP BY Events.event_type) as firstTable
    ON Events.event_type=firstTable.event_type
    WHERE occurences>avg_o
    GROUP BY Events.business_id
    Having count(*)>1;
    

    Also, the main syntax error in my first answer in the question itself is, like a_horse_with_no_name mentioned, that there should be only 1 WITH even with multiple CTEs.