Search code examples
sqlsql-serverwindow-functions

Calculating Overall Success Rate in SQL: Recursive CTE or Alternative Approach?


As input I have 3 fields:

  • Column_A: represents the step of the survey in a chronological order
  • Column_B: represents the number of persons, which participated in they survey step
  • Column_C: represents the number of persons, which were sucessfull in this specific survey step (B >= C)

Now I need to calculate a new Column_D: This column needs to represent the overall number of participants, which were remain successfull after all previous steps. So at best this number can only remain the same or being reduced after each step in case of an unsuccessfull participant.

In Excel this is kinda easy, because there i would calculate (except for row 1): Value_D of former Row - (Value_B - Value_C).

I know of lag and lead in SQL, but how can I take into account the calculation of a former row of the same column? Does it work somehow? Do I need to code a recursive CTE or something?

I think big problem is this: The number of rows (Column_A) isnt the same, so i cant just glue several CTEs together.

Column_A Column_B Column_C Column_D
0 35 35 35
1 35 35 35
2 35 34 34
3 34 33 33
4 33 30 30
5 33 31 28
6 33 33 28

I tried the following, but obviously it gives back the wrong results, because it doesnt take the former calculations in consideration.

SELECT [Column_A]
      ,[Column_B]
      ,[Column_C]
      , CASE WHEN [Column_A] = 0 
             THEN [Column_B]
             ELSE LAG([Column_C], 1, 0) OVER (PARTITION BY 1 
                                              ORDER     BY [Column_A] ASC) - ([Column_B] - [Column_C])
        END AS [Column_D]
FROM dataset

Solution

  • You can combine two window functions:

    • FIRST_VALUE, gathering the first "Column_D" value
    • SUM, gathering the running difference between "Column_B" and "Column_C"

    then you subtract the second window function from the first.

    SELECT *, FIRST_VALUE(Column_D) OVER(ORDER BY Column_A ROWS UNBOUNDED PRECEDING) - 
              SUM(Column_B - Column_C) OVER(ORDER BY Column_A ROWS UNBOUNDED PRECEDING) 
    FROM tab
    

    Output:

    Column_A Column_B Column_C Column_D (No column name)
    0 35 35 35 35
    1 35 35 35 35
    2 35 34 34 34
    3 34 33 33 33
    4 33 30 30 30
    5 33 31 28 28
    6 33 33 28 28

    Check the demo here.