I would expect the following two queries to yield identical results; however, I have several instances where the first is empty while the second returns several hundred rows. How is this possible? Working in Azure Synapse using T-SQL
SELECT [value] FROM table_a
WHERE [value] NOT IN (SELECT [value] FROM table_b)
SELECT [value] FROM table_a
EXCEPT (SELECT [value] FROM table_b)
You're forgetting two options:
First, an exclusion join:
SELECT [value]
FROM table_a a
LEFT JOIN table_b b on a.[value] = b.[value]
WHERE b.[value] IS NULL
Second, NOT EXISTS()
:
SELECT [value]
FROM table_a a
WHERE NOT EXISTS(SELECT 1 FROM table_b b WHERE b.[VALUE] = a.[VALUE])
I like both of these better than either option presented. But of all four, this last option, with NOT EXISTS()
, tends to perform better. Of course, you should also test to see what actually works best with your data and indexing.
But as to why the results are different... I'd expect to do with NULL
comparisons, since NULL
compared with itself is still false and otherwise confuse certain comparison operations. If table_b
can have NULL
values, that can definitely change things.