I have the following sql problem to solve: Table: Events, with the following columns: column name: business_id and type:int column name: event_type and type: varchar column name: occurences and type: int
(business_id, event_type) is the primary key of this table. Each row in the table logs the info that an event of some type occured at some business for a number of times.
Write an SQL query to find all active businesses, where an active business is a business that has more than one event type with occurences greater than the average occurences of that event type among all businesses.
The query result format is in the following example:
Events table: {"headers":{"Events":["business_id","event_type","occurences"]},"rows":{"Events":[[1,"reviews",7],[3,"reviews",3],[1,"ads",11],[2,"ads",7],[3,"ads",6],[1,"page views",3],[2,"page views",12]]}}
Result table for the above would have one column called business_id and it will have one row with 1.
Explanation: Average for 'reviews', 'ads' and 'page views' are (7+3)/2=5, (11+7+6)/3=8, (3+12)/2=7.5, respectively. Only business with id 1 has 7 'reviews' events (more than 5) and 11 'ads' events (more than 8) so it is the only active business.
I wrote the following query:
WITH firstTable (event_type,avg_o) as (SELECT Events.event_type, avg(occurences) as avg_o
FROM Events
GROUP BY event_type)
WITH secondTable (business_id,event_type,avg2,g) as (SELECT Events.business_id, Events.event_type,firstTable.avg_o,
CASE
WHEN Events.occurences>firstTable.avg_o THEN 1
ELSE 0
END as g
FROM Events
JOIN firstTable
ON Events.event_type = firstTable.event_type)
WITH thirdTable as (SELECT business_id, sum(g) as g2
FROM secondTable
GROUP BY business_id)
SELECT business_id
FROM thirdTable
WHERE g2>1;
Questions, please:
Your help will be greatly appreciated. Thanks a lot!
Aly
Inspired by the excellent answer from GMB, here is a simpler CTE answer:
WITH
firstTable (event_type,avg_o) as (SELECT Events.event_type, avg(occurences) as avg_o
FROM Events
GROUP BY Events.event_type)
SELECT Events.business_id
FROM Events
JOIN firstTable
ON Events.event_type=firstTable.event_type
WHERE occurences>avg_o
GROUP BY Events.business_id
Having count(*)>1;
And another simpler sub-query answer:
SELECT Events.business_id
FROM Events
JOIN (SELECT Events.event_type, avg(occurences) as avg_o
FROM Events
GROUP BY Events.event_type) as firstTable
ON Events.event_type=firstTable.event_type
WHERE occurences>avg_o
GROUP BY Events.business_id
Having count(*)>1;
Also, the main syntax error in my first answer in the question itself is, like a_horse_with_no_name mentioned, that there should be only 1 WITH even with multiple CTEs.