Search code examples
sqlsql-serverwindow-functionssql-server-2016

Use LAG with a specific condition on what LAG value to take?


Suppose I have a table as follows:

Row     Volume
1       10000
2       8000
3       0.01
4       0
5       5000
6       0

Now, if I were to use the LAG() function in SQL Server, I believe the following table will result:

Row     Volume     LAG(Volume)
1       10000      NULL
2       8000       10000
3       0.01       8000
4       0          0.01
5       5000       0
6       0          5000

However, this is not exactly what I am looking for, so allow me to provide some background as to why I need the LAG(Volume) column. The reason for it is because small values, such as 0.01 or 0 in my example, are likely to be data processing errors, so I want to override them with the last non-0.01 or non-0 value and then update the table by a JOIN. I believe this requires me to first come up with the LAG(Volume) column, but I would need the LAG() function to ignore some values. In my example, they would be 0.01 and 0. I guess, one could also say that I would like the lagged value to depend on what the lagged value actually is. In other words, for my example, I would like the following table to be produced instead:

Row     Volume     LAG(Volume)
1       10000      NULL
2       8000       10000
3       0.01       8000
4       0          8000
5       5000       8000
6       0          5000

Note that, for Row 6, the lag should be 5000 (and not 8000) as, although its volume is 0, its lag is neither 0 nor 0.01.

Can this be done?

Perhaps if anyone has alternatives as to how I can update these values to the last non-0.01 or non-0 value without having to create an intermediate lag column, please let me know as well.

Any suggestions will be greatly appreciated!


Solution

  • On earlier versions you can use a concatenation approach to simulate IGNORE NULLS. 2016 DB Fiddle.

    It uses MAX with a window of ORDER BY [Row] ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING

    Assuming Row is a positive integer the value going into the MAX is a concatenation of Row and Volume constructed such that later Row numbers will be the MAX - but these are nulled out in the cases that you want to ignore and MAX automatically disregards those.

    The Volume is then extracted out with SUBSTRING and CAST back to decimal - if you are not in fact using decimal(12,2) then change this to what you are using.

    SELECT 
     *,
     CAST(SUBSTRING(MAX(RIGHT(CONCAT('0000000000', [Row]),10) + CASE WHEN Volume > 0.01 THEN CAST(Volume AS VARCHAR(20)) END) OVER (ORDER BY [Row] ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 11, 20) AS decimal(12,2))
    FROM YourTable