I'm working to create a SQL report on answers
table:
id | created_at
1 | 2018-03-02 18:05:56
2 | 2018-04-02 18:05:56
3 | 2018-04-02 18:05:56
4 | 2018-05-02 18:05:56
5 | 2018-06-02 18:05:56
And output is:
weeks_ago | record_count (# of rows per weekly cohort) | growth (%)
-4 | 21 | 22%
-3 | 22 | -12%
-2 | 32 | 2%
-1 | 2 | 20%
0 | 31 | 0%
My query is currently erring with:
1111 - Invalid use of group function
What am I doing wrong here?
SELECT floor(datediff(f.created_at, curdate()) / 7) AS weeks_ago,
count(DISTINCT f.id) AS "New Records in Cohort",
100 * (count(*) - lag(count(*), 1) over (order by f.created_at)) / lag(count(*), 1) over (order by f.created_at) || '%' as growth
FROM answers f
WHERE f.completed_at IS NOT NULL
GROUP BY weeks_ago
HAVING count(*) > 1;
I think you want to find running count of all rows excluding the current row. I think you can ditch the LAG
function as follows:
SELECT
COUNT(*) OVER (ORDER BY f.created_at ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) x, -- running count before current row
COUNT(*) OVER (ORDER BY f.created_at) y -- running count including current row
You can divide and multiply all you want.
Nope. you simply need to separate GROUP BY
and LAG OVER
:
WITH cte AS (
SELECT
FLOOR(DATEDIFF(created_at, CURDATE()) / 7) AS weeks_ago,
COUNT(DISTINCT id) AS new_records
FROM answers
WHERE 1 = 1 -- todo: change this
GROUP BY weeks_ago
HAVING 1 = 1 -- todo: change this
)
SELECT
cte.*,
100 * (
new_records - LAG(new_records) OVER (ORDER BY weeks_ago)
) / LAG(new_records) OVER (ORDER BY weeks_ago) AS percent_increase
FROM cte