Search code examples
sqlsql-server

SQL UPDATE WHERE IN (List) or UPDATE each individually?


I'm doing my best lately to look for the best way to run certain queries in SQL that could potentially be done multiple different ways. Among my research I've come across quite a lot of hate for the WHERE IN concept, due to an inherent inefficiency in how it works.

eg: WHERE Col IN (val1, val2, val3)

In my current project, I'm doing an UPDATE on a large set of data and am wondering which of the following is more efficient: (or whether a better option exists)

UPDATE table1 SET somecolumn = 'someVal' WHERE ID IN (id1, id2, id3 ....);

In the above, the list of ID's can be up to 1.5k ID's.

VS

Looping through all ID's in code, and running the following statement for each:

UPDATE table1 SET somecolumn = 'someVal' WHERE ID = 'theID';

To myself, it seems more logical that the former would work better / faster, because there's less queries to run. That said, I'm not 100% familiar with the in's and out's of SQL and how query queueing works.

I'm also unsure as to which would be friendlier on the DB as far as table locks and other general performance.

General info in case it helps, I'm using Microsoft SQL Server 2014, and the primary development language is C#.

Any help is much appreciated.

EDIT:

Option 3:

UPDATE table1 SET somecolumn = 'someVal' WHERE ID IN (SELECT ID FROM @definedTable);

In the above, @definedTable is a SQL 'User Defined Table Type', where the data inside comes through to a stored procedure as (in C#) type SqlDbType.Structured

People are asking how the ID's come in: ID's are in a List<string>in the code, and are used for other things in the code before then being sent to a stored procedure. Currently, the ID's are coming into the stored procedure as a 'User-Defined Table Type' with only one column (ID's).

I thought having them in a table might be better than having the code concatenate a massive string and just spitting it into the SP as a variable that looks like id1, id2, id3, id4 etc


Solution

  • I'm using your third option and it works great.

    My stored procedure has a table-valued parameter. See also Use Table-Valued Parameters.

    In the procedure there is one statement, no loops, like you said:

    UPDATE table1 SET somecolumn = 'someVal' WHERE ID IN (SELECT ID FROM @definedTable);
    

    It is better to call the procedure once, than 1,500 times. It is better to have one transaction, than 1,500 transactions.

    If the number of rows in the @definedTable goes above, say, 10K, I'd consider splitting it in batches of 10K.


    Your first variant is OK for few values in the IN clause, but when you get to really high numbers (60K+) you can see something like this, as shown in this answer:

    Msg 8623, Level 16, State 1, Line 1 The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.