Search code examples
sqloracle-databaserecursionoracle-sqldeveloper

Count the number of customers with at least one presence in each of the last 3 weeks in Oracle 21c


Hello I'm coming back to you again and maybe you can help me this time too.

Starting from another post of mine and taking inspiration from here https://oracle-base.com//articles/11g/recursive-subquery-factoring-11gr2 I managed to find a solution but I would like to believe there are other methods because if I had to go back 9,10 weeks it would not have been performant at all.

I create db fiddle https://dbfiddle.uk/h1zmicL7 with ddl and dml respectively my solution. I would like someone to help me with a solution using connect by clause starting from my solution or any other solution more performant.

In the table the column "expected_output" have the final result and is added manually by me.

I m using Oracle 21c.

I will explain what the logic is here and how it should be calculated for each client much more clearly - I need all the rows in the present case, there are 21.

Why does client 1 have the first 1 in week 4 - because we look in week 1 to see if we have something in the back for client 1 and we don't, so 0, then in week 2 we also look back and see that he is present only once but we need 3 presence, so still 0, then in week 3 the same, we see 2 presence in the back so still 0 and in week 4 we have the first 1 because this client has 3 consecutive presence in the last 3 weeks and that is important to be consecutive.

The column weeknum is to_number(to_char(data,'ww')).

Why client 1 has in the week 6 a 3, because we look back at every 3 weeks so in week 6 client 1 have at least 1 presence in each consecutive weeks no gaps in 1,2,3(at week 4 first 1) and 2,3,4(at week 5 another 1 and plus 1 he have 2) and week 3,4,5 so in week 6 he had a 3.

Why client 1 has in week 8 a 0 and not a 3 because at week 6 he had a 3? Because again we look back at the last 3 weeks (5,6,7)and we see client 1 with at least 1 presence in a ride in the week 5 and 6- and no presence in week 7 so here he don t have 3 consecutive presences in the last 3 weeks so we need 0.

Like this is for all the clients.

Thank you

Ride  Client    Data         Weeknum     Expected_output
1       1   02-JAN-2023         1               0
2       1   05-JAN-2023         1               0
3       3   06-JAN-2023         1               0
4       1   11-JAN-2023         2               0
5       2   12-JAN-2023         2               0
6       1   16-JAN-2023         3               0
7       2   19-JAN-2023         3               0
8       1   24-JAN-2023         4               1
9       2   24-JAN-2023         4               0
10      1   30-JAN-2023         5               2
11      2   02-FEB-2023         5               1
12      2   08-FEB-2023         6               2
13      1   09-FEB-2023         6               3
14      3   15-FEB-2023         7               0
15      2   16-FEB-2023         7               3
16      3   17-FEB-2023         7               0
17      1   21-FEB-2023         8               0
18      3   22-FEB-2023         8               0
19      3   23-FEB-2023         8               0
20      3   28-FEB-2023         9               0
21      3   07-MAR-2023        10               1

Solution

  • If you particularly want a solution using CONNECT BY then (however, I would expect a solution using MATCH_RECOGNIZE or only analytic functions to be more efficient):

    SELECT client,
           weeknum,
           ride,
           data,
           expected_output,
           GREATEST(weeknum - CONNECT_BY_ROOT(weeknum) - 2, 0) AS output
    FROM   (
      SELECT p.*,
             CASE
             WHEN weeknum <= LAG(weeknum) OVER (PARTITION BY client ORDER BY data) + 1
             THEN 0
             ELSE 1
             END AS start_week,
             ROW_NUMBER() OVER (PARTITION BY client ORDER BY data) AS rn
      FROM   prob p
    )
    START WITH
            start_week = 1
    CONNECT BY
            PRIOR rn + 1 = rn
    AND     PRIOR client = client
    AND     start_week = 0
    

    Which, for the sample data, outputs:

    CLIENT WEEKNUM RIDE DATA EXPECTED_OUTPUT OUTPUT
    1 1 1 2023-01-02 00:00:00 0 0
    1 1 2 2023-01-05 00:00:00 0 0
    1 2 4 2023-01-11 00:00:00 0 0
    1 3 6 2023-01-16 00:00:00 0 0
    1 4 8 2023-01-24 00:00:00 1 1
    1 5 10 2023-01-30 00:00:00 2 2
    1 6 13 2023-02-09 00:00:00 3 3
    1 8 17 2023-02-21 00:00:00 0 0
    2 2 5 2023-01-12 00:00:00 0 0
    2 3 7 2023-01-19 00:00:00 0 0
    2 4 9 2023-01-24 00:00:00 0 0
    2 5 11 2023-02-02 00:00:00 1 1
    2 6 12 2023-02-08 00:00:00 2 2
    2 7 15 2023-02-16 00:00:00 3 3
    3 1 3 2023-01-06 00:00:00 0 0
    3 7 14 2023-02-15 00:00:00 0 0
    3 7 16 2023-02-17 00:00:00 0 0
    3 8 18 2023-02-22 00:00:00 0 0
    3 8 19 2023-02-23 00:00:00 0 0
    3 9 20 2023-02-28 00:00:00 0 0
    3 10 21 2023-03-07 00:00:00 1 1

    fiddle