I'm trying to track and display a user's daily streak posting to my app, but struggling to write a query that works reliably and returns an accurate count.
My app has a prompt
and a post
table. A user can submit one post for each prompt (the prompts are created daily, so one post per day, per user).
The prompt
table, simplified, looks like this:
id | dateKey | text |
---|---|---|
1 | 20240101 | This is an example prompt. |
2 | 20240102 | Here is a second prompt. |
The post
table, simplified, is something like:
id | content | promptId | authorId |
---|---|---|---|
50 | This is my response to the prompt. | 1 | 90 |
51 | A second response to the same prompt. | 1 | 91 |
I've tried a few different approaches to queries (using PARTITION BY
, dense_rank()
etc.) but could only get the user's longest streak. The query I had if you're interested:
select distinct on (p."authorId") count(distinct "dateKey"::date) as "streak"
from (select p.*,
dense_rank() over (partition by p."authorId" order by "dateKey"::date) as seq
from post p
join prompt pt on p."promptId" = pt.id
) p
join prompt pt on p."promptId" = pt.id
where p."authorId" = 90
group by p."authorId", "dateKey"::date - seq * interval '1 day'
order by p."authorId", streak desc
This appears to work with the below data, but if you add a new "missed" prompt (which should reset the streak), this query would still return 2 (I think I understand why, but not sure how to correct it).
I basically need something to start at the latest prompt, and go down the list until it finds a prompt with no post from that user.
For example, this joined data would have a streak of 2:
id | dateKey | text | postContent | authorId |
---|---|---|---|---|
1 | 20240104 | This is an example prompt. | This is my response. | 90 |
2 | 20240103 | Here is a second prompt. | A second response. | 90 |
3 | 20240102 | A third prompt. | null | null |
4 | 20240101 | My fourth prompt. | A third response, but I missed a day. | 90 |
Functionality most important, but would be nice if it were performant too (prompt
might have 1000 rows, post
might have millions, and the streak
might reach 1000).
I'm a bit lost with PostgreSQL's capabilities for things like this, so hopefully there's a simple solution out there!
Fiddle: https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/11431
For a single table with natural value progression there are simpler solutions. But for the combination of two tables with a (seemingly) arbitrary next promptId
I expect a recursive CTE to perform best:
WITH RECURSIVE cte AS (
SELECT CURRENT_DATE AS the_day, p."authorId" AS author_id
FROM prompt pt
JOIN post p ON p."promptId" = pt.id
WHERE pt."dateKey" = CURRENT_DATE
AND p."authorId" = 90 -- your author here!
UNION ALL
SELECT c.the_day - 1, p."authorId" -- assuming no gaps in prompt!
FROM cte c
JOIN prompt pt ON pt."dateKey" = c.the_day - 1
JOIN post p ON p."promptId" = pt.id
WHERE p."authorId" = c.author_id
)
SELECT count(*)
FROM cte;
Absolutely requires index support to be fast.
Ideally, one index on prompt("dateKey", id)
, and one on post("authorId", "promptId")
.
Assuming ...
prompt."dateKey"
is type date
(as it should be).Related:
If there is at most one prompt per day, consider using the date (data type date
!) as PK in table prompt
and FK in table post
. Allows for much simpler queries. See: