Search code examples
sqloracleconnect-by

How to improve processing time for queries that use CONNECT BY and LEVEL clause


I was wondering if there's a way to re-write the following oracle SQL script to get a better processing time. This current version is very slow and wouldn't work the larger dataset that I'm working with.

I'm trying to assign a number to each customer based on how many years they have been with the business. And I need it to return the each customer with their start year and then count up to the present. I've tried to explain it below.

For a customer that has been with the business for 5 years I want the script to return the following

Customer code Year Number Number of years
CUST1 2018 0
CUST1 2019 1
CUST1 2020 2
CUST1 2021 3
CUST1 2022 4
CUST1 2023 5

I'm trying to use CONNECT BY and LEVEL for this but it's taking too long.

YEARS AS 
(
  SELECT CUSTOMER_CODE, (MIN_YR+LEVEL-1) AS YR_NUM, ((MIN_YR+LEVEL-1) - MIN_YR) AS YRS
  FROM
(
  SELECT CUSTOMER_CODE, EXTRACT (YEAR FROM MIN_DT) MIN_YR , '2023' AS CUR_YR
  FROM
  (
    SELECT REV.CUSTOMER_CODE, MIN(REV.ORDER_DATE) AS MIN_DT
    FROM REVENUE_TABLE REV 
    JOIN CUSTOMER_DETAILS ACC ON REV.CUSTOMER_CODE = ACC.CUSTOMER_CODE
    GROUP BY REV.CUSTOMER_CODE
  )
)
  CONNECT BY LEVEL <= CUR_YR-MIN_YR
)

SELECT * FROM YEARS

I've used CONNECT BY and LEVEL clauses to generate a number for each customer which would represent the number of years with the business. The script is working but it would take too long for it to be useable. I let the script run for around 4 hours but it didn't finish running. The data volume is too high for this to complete.


Solution

  • Starting with the innermost query, there is probably no use including the customer_details table - you are not using anything in it. Its only possible use is to filter out revenue elements that are not in your customer_details table, but I really doubt you are trying to do that.

    A more succinct approach would be to use a recursive CTE that starts with the earliest date for each customer, then rolls it forward by a year each time until it equals the current year.

    With Recursive Years AS (
        SELECT REV.CUSTOMER_CODE, Year(MIN(REV.ORDER_DATE)) AS Year_Number
        FROM REVENUE_TABLE REV 
        GROUP BY REV.CUSTOMER_CODE
      UNION ALL
        Select CUSTOMER_CODE, Year_Number+1
        From Years
        Where Year_Number < Year(Current_Date)
    )
    Select * from Years
    

    Now you can join the rest of your tables to that result or do whatever else you need with it.