Search code examples
sqloracle-databaserecursion

How do I perform recursive search in Oracle


I am currently encountering challenges in comprehending the implementation of recursive search within Oracle SQL.

My encounter with a HackerRank question necessitates the application of this concept. Although I refrain from disclosing the specific HackerRank inquiry, I intend to leverage it as a reference to devise the logic independently. However, I seek assistance in dissecting the problem and understanding its intricacies.

Should anyone possess alternative methodologies to achieve the desired outcome, I am open to exploring diverse solutions. Your contributions and insights are highly valued, and I eagerly anticipate your responses.

Below is my sceanrio.

I have a table called : employee

It as 3 columns -> emp_id , emp_date , emp_task

Data is like below

EMP_ID | EMP_DATE   | EMP_TASK
A1     | 01-04-2024 | 345
A2     | 01-04-2024 | 546
A3     | 01-04-2024 | 232
A4     | 01-04-2024 | 8000
A5     | 01-04-2024 | 2344
A1     | 02-04-2024 | 456
A2     | 02-04-2024 | 9280
A3     | 02-04-2024 | 324
A2     | 02-04-2024 | 754
A8     | 02-04-2024 | 75 
A2     | 03-04-2024 | 400
A3     | 03-04-2024 | 234
A3     | 04-04-2024 | 100

And the condition is to calculate the total number of emp_id participated.

It is clearly mentioned that : get count of those emp_id who participated in each day.

Make sure they were their from initil earlier date.

Here is the logic I need to implement recursively but not known how to start writing query ?.

If you see Day 1 : 01-04-2024 -> emp_id participated -> A1 , A2 , A3 , A4 , A5

If you see Day 2 : 02-04-2024 -> emp_id participated -> A1 , A2 , A3`. They are the one who participated in Day 1 as well so no of emp_id participated is 3

Similarly it continues till end.

Note : First day who participated need to be consider till end. They are participating.

A8 as joined in middle as this emp_id is not part of first day so in count it is not considered.

So my expected output is below need to be achieved

emp_date   cout_participated 
01-04-2024 5  
02-04-2024 3
03-04-2024 2
04-04-2024 1

My trials not implemented fully

WITH get_initial_day_emp_participated AS ( 
   -- Fetching all emp_id who initially participated
    SELECT
        *
    FROM
        employee
    WHERE
        emp_date = (
            SELECT
                MIN(emp_date)
            FROM
                employee
        )
), get_count_initial_day_participated AS ( 
  -- Using group getting the count of emp_id who initially participated 
    SELECT
        emp_date,
        COUNT(DISTINCT emp_id)
    FROM
        get_initial_day_emp_participated
    GROUP BY
        emp_date
), get_details_except_first_date AS (
   -- Fetching all details except first day participated
    SELECT
        *
    FROM
        employee
    WHERE
        emp_date != (
            SELECT
                MIN(emp_date)
            FROM
                employee
        )
)
SELECT
    *
FROM
    get_details_except_first_date;

For you reference giving create table and insert details

create table employee
(
   EMP_ID     varchar2(10),
   EMP_DATE   date,
   EMP_TASK   number
);

Insert into employee (EMP_ID,EMP_DATE,EMP_TASK) values ('A1',to_date('01-04-2024','DD-MM-YYYY'),345);
Insert into employee (EMP_ID,EMP_DATE,EMP_TASK) values ('A2',to_date('01-04-2024','DD-MM-YYYY'),546);
Insert into employee (EMP_ID,EMP_DATE,EMP_TASK) values ('A3',to_date('01-04-2024','DD-MM-YYYY'),232);
Insert into employee (EMP_ID,EMP_DATE,EMP_TASK) values ('A4',to_date('01-04-2024','DD-MM-YYYY'),8000);
Insert into employee (EMP_ID,EMP_DATE,EMP_TASK) values ('A5',to_date('01-04-2024','DD-MM-YYYY'),2344);
Insert into employee (EMP_ID,EMP_DATE,EMP_TASK) values ('A1',to_date('02-04-2024','DD-MM-YYYY'),456);
Insert into employee (EMP_ID,EMP_DATE,EMP_TASK) values ('A2',to_date('02-04-2024','DD-MM-YYYY'),9280);
Insert into employee (EMP_ID,EMP_DATE,EMP_TASK) values ('A3',to_date('02-04-2024','DD-MM-YYYY'),324);
Insert into employee (EMP_ID,EMP_DATE,EMP_TASK) values ('A2',to_date('02-04-2024','DD-MM-YYYY'),754);
Insert into employee (EMP_ID,EMP_DATE,EMP_TASK) values ('A8',to_date('02-04-2024','DD-MM-YYYY'),75);
Insert into employee (EMP_ID,EMP_DATE,EMP_TASK) values ('A2',to_date('03-04-2024','DD-MM-YYYY'),400);
Insert into employee (EMP_ID,EMP_DATE,EMP_TASK) values ('A3',to_date('03-04-2024','DD-MM-YYYY'),234);
Insert into employee (EMP_ID,EMP_DATE,EMP_TASK) values ('A3',to_date('04-04-2024','DD-MM-YYYY'),100);

Solution

  • This isn't recursive, but an alternative method is to count the number of days that have elapsed, and the number of days that an emp_id has been active:

    select emp_id, emp_date,
      emp_date - min(emp_date) over() + 1 as day_num,
      count(*) over (partition by emp_id order by emp_date) emp_day_num
    from (
      select distinct emp_id, emp_date
      from employee
    )
    order by emp_date, emp_id
    
    EMP_ID EMP_DATE DAY_NUM EMP_DAY_NUM
    A1 01-APR-24 1 1
    A2 01-APR-24 1 1
    A3 01-APR-24 1 1
    A4 01-APR-24 1 1
    A5 01-APR-24 1 1
    A1 02-APR-24 2 2
    A2 02-APR-24 2 2
    A3 02-APR-24 2 2
    A8 02-APR-24 2 1
    A2 03-APR-24 3 3
    A3 03-APR-24 3 3
    A3 04-APR-24 4 4

    The subquery is because count(distinct emp_id) over ... isn't allowed.

    The min(emp_date) over() is the analytic version on min(), and gives the earliest date across the whole data set, but available as a value to every row. Then emp_date - min(emp_date) over() gives the offset of each row's date from that minimum; adding 1 just makes it 1-4 instead of 0-3 to make it easier to compare.

    count(*) over (partition by emp_id order by emp_date) is an analytic count that shows how many dates (distinct, because of the subquery) each emp_id has appeared so far, up to the current row's date. If an ID starts late or skips a date that count will no longer align with the day number.

    Then see if they match, and count those that do:

    select emp_date, count(emp_id)
    from (
      select emp_id, emp_date,
        emp_date - min(emp_date) over() + 1 as day_num,
        count(*) over (partition by emp_id order by emp_date) emp_day_num
      from (
        select distinct emp_id, emp_date
        from employee
      )
    )
    where emp_day_num = day_num
    group by emp_date
    order by emp_date
    
    EMP_DATE COUNT(EMP_ID)
    01-APR-24 5
    02-APR-24 3
    03-APR-24 2
    04-APR-24 1

    fiddle


    If you want a recursive approach then you could use a recursive CTE:

    with rcte (emp_id, emp_date) as (
      select emp_id, emp_date
      from employee
      where emp_date = (
        select min(emp_date) from employee
      )
      union all
      select r.emp_id, e.emp_date
      from rcte r
      join employee e
      on e.emp_id = r.emp_id
      and e.emp_date = r.emp_date + 1
    )
    select emp_date, count(distinct emp_id)
    from rcte
    group by emp_date
    order by emp_date
    
    EMP_DATE COUNT(DISTINCTEMP_ID)
    01-APR-24 5
    02-APR-24 3
    03-APR-24 2
    04-APR-24 1

    fiddle

    The anchor branch is similar to your first CTE, the recursive branch looks for a matching ID the next day. For each row in the previous rcte result, it looks in employee for a row with same ID and the next day (which is r.emp_date + 1). Once there is a gap for an ID it stop appearing.