Search code examples
sqlsql-servercommon-table-expressionrecursive-cte

How to select all participants who made at least 1 submission each day for everyday?


I have a problem like this in SQL Server. Here is the base table:

submission_date submission_id hacker_id score
2016-03-01 11 1 0
2016-03-01 12 3 15
2016-03-01 13 2 60
2016-03-03 14 1 0
2016-03-03 15 4 60
2016-03-03 16 2 25
2016-03-03 17 2 60
2016-03-08 18 1 0
2016-03-08 19 5 70

Now I want the hacker_id that made at least 1 submission each day for every day (I will know the Start Date and End Date of the competition). Below is my desired output:

submission_date submission_id hacker_id score
2016-03-01 11 1 0
2016-03-01 12 3 15
2016-03-01 13 2 60
2016-03-03 14 1 0
2016-03-03 16 2 25
2016-03-03 17 2 60
2016-03-08 18 1 0

There are 3 hackers on 2016-03-01, all counted since it's the first day. On the next day ( 2016-03-03), only 1 and 2 keep submitting, and 2 submits two times. On the final day, there is only 1 who keeps submitting.

This is quite similar to the problem on Hackerrank, and there were lots of posts here about it. But my case is different, the date is not continuous day by day like in this post, the day is skipped

I've already solved the continuous day problem by using Recursive CTE like below:

WITH cte([submission_date], [submission_id], [hacker_id], [score], BaseDate) AS
(
    SELECT
        [submission_date], [submission_id], [hacker_id], [score], 
        CAST('2016-03-02' AS date) AS BaseDate
    FROM
        [dbo].[bSubmissions]
    WHERE 
        [submission_date] = '2016-03-01'

    UNION ALL

    SELECT
        S.[submission_date], S.[submission_id], S.[hacker_id], 
        S.[score], CAST(DATEADD(DAY, 1, C.BaseDate) AS date)
    FROM
        [dbo].[bSubmissions] AS S 
    JOIN 
        cte AS C ON C.hacker_id = S.hacker_id 
                 AND C.BaseDate = S.[submission_date]
)
select * from cte

But it won't work for this case. My idea is still the same, select the hackers that have posted on the previous cte table (all the previous days), add (union) them to the current cte table (by joining with special conditions), and then continue the recursion.

But I don't know how to properly set up this Recursive CTE.

My other idea is to create a procedure which takes hacker_id, start_date, current_date and checks if that hacker posts every day during the period. But let that idea aside. I want to focus on the recursive/normal method first.

Insert data query for anyone who needed:

CREATE TABLE bSubmissions
(
   submission_date DATE  NOT NULL
  ,submission_id   INTEGER  NOT NULL
  ,hacker_id       INTEGER  NOT NULL
  ,score           INTEGER  NOT NULL
);

INSERT INTO bSubmissions(submission_date,submission_id,hacker_id,score) 
VALUES
 ('2016-03-01',11,1,0)
,('2016-03-01',12,3,15)
,('2016-03-01',13,2,60)
,('2016-03-03',14,1,0)
,('2016-03-03',15,4,60)
,('2016-03-03',16,2,25)
,('2016-03-03',17,2,60)
,('2016-03-08',18,1,0)
,('2016-03-08',19,5,70);

Solution

  • Instead of using a RECURSIVE CTE, you can use two ranking indices:

    • one for each submission date
    • one for each hacker submission (partitioned by "hacker_id")

    In this way, if a hacker does not submit one day, he won't be considered anymore in the competition (the ranking indices will differ).

    In order to make sure to get all indices correctly, we can use the DENSE_RANK window function, that will assign the same rank to competitions in the same day, yet keeping the rankings consecutive.

    WITH cte AS (
        SELECT *,
               DENSE_RANK() OVER(
                   ORDER BY submission_date    ) AS rn_competition,
               DENSE_RANK() OVER(
                   PARTITION BY hacker_id 
                   ORDER     BY submission_date) AS rn_submission
        FROM tab
    )
    SELECT *
    FROM cte
    WHERE rn_competition = rn_submission
    ORDER BY submission_id
    

    Check the demo here.

    Note: the last ORDER BY submission_id clause is not required.