Search code examples
t-sqlsql-server-2017

Return previous value before null value


I'm currently querying a table with missing indexes.

Here is some sample data:

id  dStartDate
126 2010-04-22 00:00:00.000
127 NULL
128 2010-04-29 00:00:00.000
129 2010-05-03 00:00:00.000
130 NULL
131 NULL
132 NULL
133 2010-04-29 00:00:00.000
134 NULL
135 NULL
136 2010-04-29 00:00:00.000
137 NULL
138 NULL
139 2010-04-29 00:00:00.000
140 NULL
141 2010-04-29 00:00:00.000
142 2010-04-29 00:00:00.000
143 NULL
144 NULL

I use the following script to get the missing indexes:

declare @id int
declare @maxid int

set @id = 1
select @maxid = max(idJCMaster) from _btblJCMaster

declare @IDseq table    (id int)

while @id < @maxid --whatever you max is
begin
    insert into @IDseq values(@id)

    set @id = @id + 1
end

select 
    s.id
from        @IDseq s 
left join   _btblJCMaster t on s.id = t.idJCMaster
where t.idJCMaster is null

The above works perfect, however, I would like to see the previous record (that is not null) date, to have an idea when this record was deleted...

I altered the above script to look like this:

declare @id int
declare @maxid int

set @id = 1
select @maxid = max(idJCMaster) from _btblJCMaster

declare @IDseq table    (id int)

while @id < @maxid --whatever you max is
begin
    insert into @IDseq values(@id)

    set @id = @id + 1
end

select 
    s.id
,   t.dStartDate
from        @IDseq s 
left join   _btblJCMaster t on s.id = t.idJCMaster

The results I get looks like this:

Script 2 Results

As can be seen, sometimes there is more than on record missing for those particular indexes...

I'm not really sure how to alter the script to show me the previous date (before null).

In this example, my expected results would be:

Expected Results

Please assist with the expected results?

Your assistance will be much appreciated!

Edit

After the assistance of Ankit, tried the following (his answer):

declare @id int
declare @maxid int

set @id = 1
select @maxid = max(idJCMaster) from _btblJCMaster

declare @IDseq table    (id int)

while @id < @maxid --whatever you max is
begin
    insert into @IDseq values(@id)

    set @id = @id + 1
end

select 
    s.id
,   (SELECT MAX(dStartDate)
            FROM _btblJCMaster
            WHERE id >= t1.idJCMaster) dStartDate
from        @IDseq s 
left join   _btblJCMaster t1 on s.id = t1.idJCMaster

But I still receive NULLS.

I then proceeded to try his first answer, by altering the LAG function a bit and also adding LEAD, with 3 CTE's, but I still get NULLS:

declare @id int
declare @maxid int

set @id = 1
select @maxid = max(idJCMaster) from _btblJCMaster

declare @IDseq table    (id int)

while @id < @maxid --whatever you max is
begin
    insert into @IDseq values(@id)

    set @id = @id + 1
end


;with cte (id, dStartDate, idJCMaster)
as
(
select 
    s.id
,   ISNULL(dStartDate, isnull(LAG(dStartDate) OVER(order by s.id),LEAD(dStartDate) OVER(order by s.id)))
,   IdJCMaster
from        @IDseq s 
left join   _btblJCMaster t1 on s.id = t1.idJCMaster
)
,   cte2 (id,dStartDate, idJCMaster)
as
(
select
    id
,   isnull(dStartDate,LAG(dStartDate) OVER(order by id))
,   idJCMaster
from    cte
)
,   cte3 (id,dStartDate, idJCMaster)
as
(
select
    id
,   isnull(dStartDate,LEAD(dStartDate) OVER(order by id))
,   idJCMaster
from    cte2
)

select
    id
,   isnull(dStartDate,LAG(dStartDate) OVER(order by id))
from    cte3
where   idJCMaster is null

Is there no other easier way to accomplish this?


Solution

  • You can try something along this:

    First we need a mockup table to simulate your issue. Please provide this yourself in your next question. It is always best to provide a self-running, stand-alone sample including DDL, INSERT and your own attempt. Such a simuation is called MCVE.

    DECLARE @tbl TABLE(id INT,  dStartDate DATE);
    INSERT INTO @tbl VALUES
     (126,'2010-04-22 00:00:00.000')
    ,(127,NULL)
    ,(128,'2010-04-29 00:00:00.000')
    ,(129,'2010-05-03 00:00:00.000')
    ,(130,NULL)
    ,(131,NULL)
    ,(132,NULL)
    ,(133,'2010-04-29 00:00:00.000')
    ,(134,NULL)
    ,(135,NULL)
    ,(136,'2010-04-29 00:00:00.000')
    ,(137,NULL)
    ,(138,NULL)
    ,(139,'2010-04-29 00:00:00.000')
    ,(140,NULL)
    ,(141,'2010-04-29 00:00:00.000')
    ,(142,'2010-04-29 00:00:00.000')
    ,(143,NULL)
    ,(144,NULL);
    

    --The query

    WITH cte AS(SELECT id,dStartDate FROM @tbl WHERE dStartDate IS NOT NULL)
    SELECT t.id 
          ,A.gaplessStartDate
    FROM @tbl t
    CROSS APPLY(SELECT TOP 1 cte.dStartDate 
                FROM cte 
                WHERE cte.id<=t.id 
                ORDER BY cte.id DESC) A(gaplessStartDate);
    

    The idea in short:

    We use a CTE first to get a set with non-null rows only.
    Now we can use APPLY to get the fitting row along with the id by calling the top-most of the smaller ids sorted in descending order.

    The approach is sort of a triangle JOIN (Jeff Moden wrote a great article on this). Any row will need a correlated sub-query with an ORDER BY action.

    Hint: With larger set this might be faster if you use an indexed temp table instead of the CTE.