I have a database with 3 tables: Cohort time periods, users and events.
Cohorts have many users, each user has many events. Cohorts also have periods of time associated with them. I'd like to know for each cohort, for each period of time how many events occurred.
If there were 2 tables it would be easy to do a CROSS JOIN
but I'm stuck when there's this intermediary table.
Here's the DB structure:
create table time_periods (
cohort_name varchar,
period_name varchar,
start_time timestamp,
end_time timestamp);
create table users (
cohort_name varchar,
user_name varchar
);
create table events (
user_name varchar,
ts timestamp);
insert into time_periods values
('cohort1', 'first', '2017-01-01', '2017-01-10'),
('cohort1', 'second', '2017-01-10', '2017-01-20'),
('cohort2', 'first', '2017-01-15', '2017-01-20');
insert into users values
('cohort1', 'alice'),
('cohort2', 'bob');
insert into events values
('alice', '2017-01-07'),
('alice', '2017-01-17'),
('bob', '2017-01-18');
This is as far as I can get with the SQL - doing a triple cross join but it is not correct - the result is 6 events, when it should be just 1 per row.
select
time_periods.cohort_name,
period_name,
count(ts)
from time_periods, users, events
group by 1, 2
order by time_periods.cohort_name
Here's the SQLFiddle:
You need to specify on which columns you want to join the tables If I understand your data correctly you want something like this:
select
tp.cohort_name,
tp.period_name,
count(*)
from time_periods tp
inner join users u on tp.cohort_name = u.cohort_name
inner join events e on u.user_name = e.user_name and e.ts between tp.start_time and tp.end_time
group by 1, 2
order by tp.cohort_name
In here you join from time_periods
to users
only for users in correct cohort and then join to events
only for specified users and events in particular time period, then grouping by 1 and 2 to get the correct even count