Apologies but I can't offer a reproducible example for this. It won't reproduce with a simple example and thus must be related to our data structures and volume: I'm hoping someone can see a pattern here and offer some advice.
We previously had a stored procedure, which felt oddly written to me but worked fine, which ran code equivalent to the following pseudocode:
DECLARE @HasResults BIT = 0;
IF
(SELECT COUNT(*) FROM myTable t
WHERE
t.field1 = @param1
OR t.field2 = @param2
OR t.field3 = @param3
OR t.field4 = @param4) > 0
SET @HasResults = 1
SELECT @HasResults AS HasResults
All of the fields and params were originally integers and in normal usage all but one of them will be NULL. I had to change one of the params to an nvarchar(max) so that it would take a list, which I split with a fairly standard splitting function and then use an IN statement:
DECLARE @HasResults BIT = 0;
IF
(SELECT COUNT(*) FROM myTable t
WHERE
t.field1 = @param1
OR t.field2 = @param2
OR t.field3 IN (select ID from fnSplit(@param3))
OR t.field4 = @param4) > 0
SET @HasResults = 1
SELECT @HasResults AS HasResults
This resulted in the query, in some circumstances, going from a sub-second to over a minute. Now you might expect that from an IN comparison but what baffles me is that if there's data in @param3 it works fine - it's if @param3 is NULL that the query is slow. If you comment out the IN clause, it goes back to sub-second speed.
The splitting function isn't the problem here - it's very fast and I've experimented with it but nothing improves.
To further confuse me I discovered that you can significantly improve the situation by removing that unnecessary IF statement. This takes about 10 seconds to run, which is much slower than the original query but much faster than using the IF:
SELECT COUNT(*) FROM myTable t
WHERE
t.field1 = @param1
OR t.field2 = @param2
OR t.field3 = @param3
OR t.field4 = @param4
Why is this query running so much slower when I try and split a NULL value and use the results in an IN statement, and why does the IF have such an impact?
EDIT: split function as requested:
CREATE FUNCTION [dbo].[fnSplit](
@sInputList VARCHAR(MAX)
, @sDelimiter VARCHAR(MAX) = ','
) RETURNS @List TABLE (item VARCHAR(MAX))
BEGIN
DECLARE @sItem VARCHAR(MAX)
WHILE CHARINDEX(@sDelimiter,@sInputList,0) <> 0
BEGIN
SELECT
@sItem=RTRIM(LTRIM(SUBSTRING(@sInputList,1,CHARINDEX(@sDelimiter,@sInputList,0)-1))),
@sInputList=RTRIM(LTRIM(SUBSTRING(@sInputList,CHARINDEX(@sDelimiter,@sInputList,0)+LEN(@sDelimiter),LEN(@sInputList))))
IF LEN(@sItem) > 0
INSERT INTO @List SELECT @sItem
END
IF LEN(@sInputList) > 0
INSERT INTO @List SELECT @sInputList
RETURN
END
GO
Understanding why a scalaire UDF or a multi-instruction table can kill performances is very simple.
UDF are Transact SQL that is a interpreted language. Not a compiled one. So the function will be call on every row. This had 3 consequences :
So if you want performances, avoid using UDF when there is another solution.
You can use instead STRING_SPLIT which I think is the fastest or XML operations.
In fact, in queries, that operates as "sets" and not on every values, any use of anything that have an iterative process will kill all performances. Recursive queries includes....