Search code examples
sqlsql-serversql-updatesubqueryquery-optimization

SQL Server update query extremely slow


I want to a simple update statement on a table with c.200m records. However, it seems to be taking ages.

UPDATE a
SET hybrid_trade_flag = CASE WHEN b.trade_id IS NOT NULL THEN 'Y' ELSE 'N' END
FROM [tbl_master_trades] a
LEFT JOIN [tbl_hybrid_trades_subset] b
    ON a.trade_id = b.trade_id

The 1st table tbl_master_trades has c.200m records and has an index created on trade_id and another column (together). The 2nd table tbl_hybrid_trades_subset has around 200k. This query ran for over 40 mins before I had to cancel it (cancellation itself took around 30 min).

I thought maybe converting the 2nd table into a temp table and splitting the statement would help, so converted it into the following:

UPDATE a
SET hybrid_trade_flag = 'Y'
FROM [tbl_master_trades] a
INNER JOIN #tmp_hybrid_trades b
    ON a.trade_id = b.trade_id

UPDATE a
SET hybrid_trade_flag = 'N'
FROM [tbl_master_trades] a
WHERE hybrid_trade_flag IS NULL

Even above two queries took 30 min to run. I need to run several such updates (c.80) on the 1st table, so I'm not sure if this is viable as it would take days! Can someone please advise on if/how I can speed this up?


Solution

  • I would start by rewriting the query to use exists:

    update t
    set hybrid_trade_flag = case 
        when exists(select 1 from tbl_hybrid_trades_subset ts where ts.trade_id = t.trade_id)
        then 'Y'
        else 'N'
    end
    from tbl_master_trades t
    

    Then, I would recommend an index on tbl_hybrid_trades_subset(trade_id) so the subquery can execute quickly.

    An index on tbl_master_trades(trade_id) might also help (without any other column in the index), but the index on the table that the subquery addresses seems more important.

    That said, 200M rows is still a large number of rows to proceed, so the query will probably take quite a lot of time anyway.