Search code examples
sql-servert-sqlsql-server-2014

How to optimize the insert query from multiple tables?


I have 2 tables, Table 1 (temp table in SP) has around 400 records. Table 2 has around 30,550,284 records.

I need to run a loop on table 1 for each record and get the top 1 from table 2 based on a few conditions (where clause) and then order by modified date in decreasing order.

There is an index on the modified date.

declare @iPos int;
declare @iCount int;
select @iCount = count(*) from Table1;
set @iPos = 1;

declare @Table2 table(......)
declare @timestampLocal2 datetime
while (@iPos <= @iCount)
BEGIN
   select @val1 = Col1, @timestampLocal = TimeStamp  
   from @Table1 where ID = @iPos

   set @timestampLocal2 = DATEADD(HH,-96,@timestampLocal) 
   INSERT INTO @Temp3 ( .... ),....)
    select   top 1 r.LastModified, r.[Col2], r.Col3, @iPos
    from Table2  (NOLOCK) r 
    where  Col1 =@val1 and  
    r.LastModified <= @timestampLocal
    and r.LastModified >=  @timestampLocal2
    and (r.Col2 is not null and r.Col3 is not null) 
    order by LastModified  desc

    SELECT @iPos = @iPos + 1;
END

This query is very slow.

I have also thought to archive table 2, But I want to keep that as the second option for now.

Do I really need to add an index on the columns which are involved in the where clause?

So my question is, in terms of performance is there a better way to do this?


Solution

  • I believe a CROSS APPLY or OUTER APPLY may do the trick. These can be thought of as being similar to INNER JOIN or LEFT JOIN, except that they allow you to reference a subquery having more complex conditions such as TOP 1 and ORDER BY. Ideal for cases like this.

    -- INSERT INTO @Temp3 ( .... )
    select r.LastModified, r.[Col2], r.Col3, t1.ID
    from @Table1 t1
    cross apply (
        SELECT TOP 1 r.*
        from Table2 r -- Don't use (NOLOCK)
        where r.Col1 = t.Col1
        and r.LastModified <= t1.[TimeStamp]
        and r.LastModified >= DATEADD(HH,-96,t1.[TimeStamp])
        and (r.Col2 is not null and r.Col3 is not null) 
        order by r.LastModified desc
    ) r
    

    For efficiency, I recommend an index on Table2(Col1,LastModified) or as an absolute minimum, an index on Table2(Col1).

    I would strongly discourage the use of (NOLOCK) or 'READ UNCOMMITTED` in queries that update the database (like the insert into table3 above). While the query may appear to work most of the time, seemingly random occurrences of missing or duplicate rows may result.

    Do you need to handle cases where no matching Table2 record is found? The above will quietly ignore such cases. Changing the CROSS APPLY to an OUTER APPLY together with logic to handle null r.xxx values could be what you need.