I have a MS SQL Server table T1
with three code columns: CodeA
, CodeB1
and CodeB2
. And I have a table type variable with the same columns.
I must join the table variable with the T1
table in order to get the lines in table variable that match for CodeB1
and/or CodeB2
, but not CodeA
, or match CodeA
, but neither CodeB1
nor CodeB2
.
I initially I did a single SELECT
statement like that:
SELECT *
(SELECT
CASE WHEN t.CodeA = v.CodeA then 1 else 0 end as [EqualCodeA],
CASE WHEN t.CodeB1 = v.CodeB1 then 1 else 0 end as [EqualCodeB1],
CASE WHEN t.CodeB2 = v.CodeB2 then 1 else 0 end as [EqualCodeB2]
FROM @tableVariable v
INNER JOIN [T1] t
ON t.CodeA = v.CodeA or
t.CodeB1 = v.CodeB1 or
t.CodeB2 = v.CodeB2
)
WHERE NOT(EqualCodeA = 1 AND (EqualCodeB1 = 1 OR EqualCodeB2 = 1)
But that query has very poor performance. So I switched the OR
in predicate for UNION
, like that:
SELECT *
(SELECT
CASE WHEN t.CodeA = v.CodeA then 1 else 0 end as [EqualCodeA],
CASE WHEN t.CodeB1 = v.CodeB1 then 1 else 0 end as [EqualCodeB1],
CASE WHEN t.CodeB2 = v.CodeB2 then 1 else 0 end as [EqualCodeB2]
FROM @tableVariable v
INNER JOIN [T1] t
ON t.CodeA = v.CodeA
UNION
SELECT
CASE WHEN t.CodeA = v.CodeA then 1 else 0 end as [EqualCodeA],
CASE WHEN t.CodeB1 = v.CodeB1 then 1 else 0 end as [EqualCodeB1],
CASE WHEN t.CodeB2 = v.CodeB2 then 1 else 0 end as [EqualCodeB2]
FROM @tableVariable v
INNER JOIN [T1] t
ON t.CodeB1 = v.CodeB1
UNION
SELECT
CASE WHEN t.CodeA = v.CodeA then 1 else 0 end as [EqualCodeA],
CASE WHEN t.CodeB1 = v.CodeB1 then 1 else 0 end as [EqualCodeB1],
CASE WHEN t.CodeB2 = v.CodeB2 then 1 else 0 end as [EqualCodeB2]
FROM @tableVariable v
INNER JOIN [T1] t
ON t.CodeB2 = v.CodeB2)
)
WHERE NOT(EqualCodeA = 1 AND (EqualCodeB1 = 1 OR EqualCodeB2 = 1)
The performance is now about ten times better, but it's still unacceptable. E.g. for 10K rows in table variable and 50K rows in T1
, the query takes two minutes.
Looking at the actual execution plan, I see two Hash Match (Union)
operations, each accounting for 46% of the cost.
How could I improve this performance?
Note 1: there is a Non-clustered index including all three columns, as well as three separate non-clustered indexes, one for each column.
Note 2: I use the OPTION(RECOMPILE);
in order for the optimizer at least to know the table variable actual number of rows.
To be honest I don't really see how the example code matches the description of your requirement, but assuming the latter is correct, this is what I came up with:
-- in order to get the lines in table variable that match for CodeB1 and/or CodeB2, but not CodeA,
-- or match CodeA, but neither CodeB1 nor CodeB2.
SELECT v.*
FROM @tableVariable v
JOIN T1 t
ON (v.CodeA <> t.CodeA AND (v.CodeB1 = t.CodeB1 OR v.CodeB2 = t.CodeB2))
OR (v.codeA = t.CodeA AND v.CodeB1 <> t.CodeB1 AND v.codeB2 <> t.codeB2)
GO
-- convert OR into UNION
SELECT v.*
FROM @tableVariable v
JOIN T1 t
ON (v.CodeA <> t.CodeA AND (v.CodeB1 = t.CodeB1 OR v.CodeB2 = t.CodeB2))
UNION
SELECT *
FROM @tableVariable v
JOIN T1 t
ON (v.codeA = t.CodeA AND v.CodeB1 <> t.CodeB1 AND v.codeB2 <> t.codeB2)
GO
-- further convert OR into UNION
SELECT v.*
FROM @tableVariable v
JOIN T1 t
ON v.CodeA <> t.CodeA
AND v.CodeB2 = t.CodeB2
UNION
SELECT v.*
FROM @tableVariable v
JOIN T1 t
ON v.CodeA <> t.CodeA
AND v.CodeB1 = t.CodeB1
UNION
SELECT v.*
FROM @tableVariable v
JOIN T1 t
ON v.codeA = t.CodeA
AND v.CodeB1 <> t.CodeB1
AND v.codeB2 <> t.codeB2
-- potentially helpfull indexes
CREATE INDEX idx1 ON T1 (CodeA) INCLUDE (CodeB1, CodeB2)
CREATE INDEX idx2 ON T1 (CodeB1) INCLUDE (CodeA)
CREATE INDEX idx3 ON T1 (CodeB2) INCLUDE (CodeA)
It might well be this has the exact same cost as your solution, the optimizer might be able to (internally) convert them into the exact same action. It'd be interesting to see the query-plans and/or get a better idea about the data at hand.
PS: and as already mentioned elsewhere, try to avoid @tableVariables, #tempTables are much better when handling more than a few records. (you can put indexes, statistics etc... as needed)