Currently using Looker SQL Runner and I need to retrieve the total number of sessions that have occurred for each website ID and each website ID has different timepoints at which I need to retrieve the data from the beginning up until that timepoint.
The format would be something like this:
website_id | date | total_sessions | total_users |
---|---|---|---|
1 | 2021-08-13 | 100 | 98 |
2 | 2021-07-15 | 125 | 105 |
3 | 2021-06-29 | 98 | 85 |
Right now, my code looks like this:
with base as
(
SELECT
website_id
,session_count
,user_id
,'timestamp'
FROM page_views_y2
WHERE website_id in (100604,100577,1078,100501)
GROUP BY 1,2,3
),
min_time as
(
SELECT
website_id
,session_count
,user_id
,min(timestamp) as session_start
FROM page_views_y2
WHERE website_id in (100604,100577,1078,100501)
GROUP BY 1,2,3
),
combined_count AS
(
SELECT
base.website_id
,session_start
,COUNT(DISTINCT min_time.user_id||session_start||min_time.session_count) AS distinct_combined_count
,COUNT(DISTINCT base.user_id) AS distinct_users
FROM base
LEFT JOIN min_time
ON base.user_id=min_time.user_id
AND base.website_id=min_time.website_id
GROUP BY 1,2
)
SELECT
website_id
,CASE
WHEN website_id=100604 AND session_start < (timestamp '2021-07-27') THEN distinct_combined_count
WHEN website_id in (100577,1078) AND session_start < (timestamp '2021-07-26') THEN distinct_combined_count
WHEN website_id=100501 AND session_start < (timestamp '2021-07-16') THEN distinct_combined_count
ELSE 0 END AS final_count
,CASE
WHEN website_id=100604 AND session_start < (timestamp '2021-07-27') THEN distinct_users
WHEN website_id in (100577,1078) AND session_start < (timestamp '2021-07-26') THEN distinct_users
WHEN website_id=100501 AND session_start < (timestamp '2021-07-16') THEN distinct_users
ELSE 0 END AS final_distinct_users
FROM combined_count
GROUP BY 1
But I keep running into ERROR: column "combined_count.session_start" must appear in the GROUP BY clause or be used in an aggregate function despite it clearly in group by.
Also this is only four website IDs, of which I still need to process another 30, so if I can get these four working then I can scale it out to the whole list of ID's and their respective timepoints.
Thank you for your time.
You need to aggregate the "CASE" statements.