I have a query that returns data from a main table (Works) and pulls in some addition data (Publishers) using a sub-query in the SELECT list, that is then grouped up into a CSV list. Something like this:
WID | Work Name | Publishers |
---|---|---|
1 | Work Title 1 | Red Publisher 1, Blue Publisher 1 |
2 | Work Title 2 | Red Publisher 1, Red Publisher 2 |
3 | Work Title 3 | Yellow Publisher 1, Red Publisher 3 |
4 | Work Title 4 | Green Publisher 1 |
5 | Work Title 5 | Red Publisher 2 |
I need to modify the WHERE clause to exclude rows where ALL the linked Publishers per Work appear in another table.
For example, if I wanted to exclude all the 'Red Publisher' data where they are the sole Publisher(s):
ExcludedPublishers |
---|
Red Publisher 1 |
Red Publisher 2 |
Red Publisher 3 |
I found a solution using counts, but the performance is unacceptable, going from a few seconds to 30-40 seconds. The main table has about 3 million rows in total, but the query is filtered on other data. Therefore, the returned record set is much less <= 1000 rows.
This is the query comparing the count of linked records against the count of those linked record which exist in the exclusion table, which should not be equal:
SELECT w.*
, stuff( ( SELECT ',' + p.name FROM Publishers p WHERE p.wid = w.wid FOR XML PATH ('') ), 1, 1, '') publishers
FROM Works w
WHERE w.wid <= 4 -- Limit to the Works returned for this demo
AND ( SELECT count(*) FROM Publishers p2 WHERE p2.wid = w.wid ) -- Number of linked Publishers
!= ( SELECT count(*) -- Number of linked Publishers that exist in the ExcludedPublishers table
FROM ExcludedPublishers xp
WHERE xp.pub_code IN ( SELECT p3.code FROM Publishers p3 WHERE p3.wid = w.wid )
);
Returning:
WID | Work Name | Publishers |
---|---|---|
1 | Work Title 1 | Red Publisher 1, Blue Publisher 1 |
3 | Work Title 3 | Yellow Publisher 1, Red Publisher 3 |
4 | Work Title 4 | Green Publisher 1 |
Here is SQL Fiddle detailing this example: http://sqlfiddle.com/#!18/60e2cf/6 and I'm using SQL Server 2014.
My question is, is there a more performant way to achieve this?
Thanks
You can try the query below
SELECT w.*
, STUFF( ( SELECT ', ' + p.name FROM Publishers p WHERE p.wid = w.wid FOR XML PATH ('') ), 1, 1, '') AS publishers
FROM Works w
JOIN Publishers p ON p.wid = w.wid
WHERE NOT EXISTS (
SELECT 1
FROM ExcludedPublishers
WHERE pub_code = p.code
)