I have 2 tables, Table 1 (temp table in SP) has around 400 records. Table 2 has around 30,550,284 records.
I need to run a loop on table 1 for each record and get the top 1 from table 2 based on a few conditions (where clause) and then order by modified date in decreasing order.
There is an index on the modified date.
declare @iPos int;
declare @iCount int;
select @iCount = count(*) from Table1;
set @iPos = 1;
declare @Table2 table(......)
declare @timestampLocal2 datetime
while (@iPos <= @iCount)
BEGIN
select @val1 = Col1, @timestampLocal = TimeStamp
from @Table1 where ID = @iPos
set @timestampLocal2 = DATEADD(HH,-96,@timestampLocal)
INSERT INTO @Temp3 ( .... ),....)
select top 1 r.LastModified, r.[Col2], r.Col3, @iPos
from Table2 (NOLOCK) r
where Col1 =@val1 and
r.LastModified <= @timestampLocal
and r.LastModified >= @timestampLocal2
and (r.Col2 is not null and r.Col3 is not null)
order by LastModified desc
SELECT @iPos = @iPos + 1;
END
This query is very slow.
I have also thought to archive table 2, But I want to keep that as the second option for now.
Do I really need to add an index on the columns which are involved in the where clause?
So my question is, in terms of performance is there a better way to do this?
I believe a CROSS APPLY
or OUTER APPLY
may do the trick. These can be thought of as being similar to INNER JOIN
or LEFT JOIN
, except that they allow you to reference a subquery having more complex conditions such as TOP 1
and ORDER BY
. Ideal for cases like this.
-- INSERT INTO @Temp3 ( .... )
select r.LastModified, r.[Col2], r.Col3, t1.ID
from @Table1 t1
cross apply (
SELECT TOP 1 r.*
from Table2 r -- Don't use (NOLOCK)
where r.Col1 = t.Col1
and r.LastModified <= t1.[TimeStamp]
and r.LastModified >= DATEADD(HH,-96,t1.[TimeStamp])
and (r.Col2 is not null and r.Col3 is not null)
order by r.LastModified desc
) r
For efficiency, I recommend an index on Table2(Col1,LastModified) or as an absolute minimum, an index on Table2(Col1).
I would strongly discourage the use of (NOLOCK)
or 'READ UNCOMMITTED` in queries that update the database (like the insert into table3 above). While the query may appear to work most of the time, seemingly random occurrences of missing or duplicate rows may result.
Do you need to handle cases where no matching Table2 record is found? The above will quietly ignore such cases. Changing the CROSS APPLY
to an OUTER APPLY
together with logic to handle null r.xxx values could be what you need.