Search code examples
sqlsyntax-errorcommon-table-expressionlooker

Retrieving data with each ID having a unique timestamp from which to retrieve data from


Currently using Looker SQL Runner and I need to retrieve the total number of sessions that have occurred for each website ID and each website ID has different timepoints at which I need to retrieve the data from the beginning up until that timepoint.

The format would be something like this:

website_id date total_sessions total_users
1 2021-08-13 100 98
2 2021-07-15 125 105
3 2021-06-29 98 85

Right now, my code looks like this:

with base as
(
SELECT
      website_id
      ,session_count
      ,user_id
      ,'timestamp'
FROM page_views_y2
WHERE website_id in (100604,100577,1078,100501)
GROUP BY 1,2,3
),

min_time as
(
SELECT
      website_id
      ,session_count
      ,user_id
      ,min(timestamp) as session_start
FROM page_views_y2
WHERE website_id in (100604,100577,1078,100501)
GROUP BY 1,2,3
),

combined_count AS
(
SELECT
      base.website_id
      ,session_start
      ,COUNT(DISTINCT min_time.user_id||session_start||min_time.session_count) AS distinct_combined_count
      ,COUNT(DISTINCT base.user_id) AS distinct_users
FROM base
LEFT JOIN min_time
ON base.user_id=min_time.user_id
AND base.website_id=min_time.website_id
GROUP BY 1,2
)

SELECT
      website_id
      ,CASE 
        WHEN website_id=100604 AND session_start < (timestamp '2021-07-27') THEN distinct_combined_count
        WHEN website_id in (100577,1078) AND session_start < (timestamp '2021-07-26') THEN distinct_combined_count
        WHEN website_id=100501 AND session_start < (timestamp '2021-07-16') THEN distinct_combined_count
      ELSE 0 END AS final_count
      ,CASE 
        WHEN website_id=100604 AND session_start < (timestamp '2021-07-27') THEN distinct_users
        WHEN website_id in (100577,1078) AND session_start < (timestamp '2021-07-26') THEN distinct_users
        WHEN website_id=100501 AND session_start < (timestamp '2021-07-16') THEN distinct_users
      ELSE 0 END AS final_distinct_users
FROM combined_count
GROUP BY 1

But I keep running into ERROR: column "combined_count.session_start" must appear in the GROUP BY clause or be used in an aggregate function despite it clearly in group by.

Also this is only four website IDs, of which I still need to process another 30, so if I can get these four working then I can scale it out to the whole list of ID's and their respective timepoints.

Thank you for your time.


Solution

  • You need to aggregate the "CASE" statements.