Search code examples
sqlsql-serversubquerywhere-clause

SQL query to exclude rows from the main query based on a sub-query that checks ALL rows exist


I have a query that returns data from a main table (Works) and pulls in some addition data (Publishers) using a sub-query in the SELECT list, that is then grouped up into a CSV list. Something like this:

WID Work Name Publishers
1 Work Title 1 Red Publisher 1, Blue Publisher 1
2 Work Title 2 Red Publisher 1, Red Publisher 2
3 Work Title 3 Yellow Publisher 1, Red Publisher 3
4 Work Title 4 Green Publisher 1
5 Work Title 5 Red Publisher 2

I need to modify the WHERE clause to exclude rows where ALL the linked Publishers per Work appear in another table.

For example, if I wanted to exclude all the 'Red Publisher' data where they are the sole Publisher(s):

ExcludedPublishers
Red Publisher 1
Red Publisher 2
Red Publisher 3

I found a solution using counts, but the performance is unacceptable, going from a few seconds to 30-40 seconds. The main table has about 3 million rows in total, but the query is filtered on other data. Therefore, the returned record set is much less <= 1000 rows.

This is the query comparing the count of linked records against the count of those linked record which exist in the exclusion table, which should not be equal:

SELECT w.*
     , stuff( ( SELECT ',' + p.name FROM Publishers p WHERE  p.wid = w.wid FOR XML PATH ('') ), 1, 1, '') publishers
FROM   Works w
WHERE  w.wid <= 4 -- Limit to the Works returned for this demo
 AND   ( SELECT count(*) FROM Publishers p2 WHERE p2.wid = w.wid ) -- Number of linked Publishers
    != ( SELECT count(*) -- Number of linked Publishers that exist in the ExcludedPublishers table
         FROM   ExcludedPublishers xp 
         WHERE  xp.pub_code IN ( SELECT p3.code FROM Publishers p3 WHERE p3.wid = w.wid ) 
       );

Returning:

WID Work Name Publishers
1 Work Title 1 Red Publisher 1, Blue Publisher 1
3 Work Title 3 Yellow Publisher 1, Red Publisher 3
4 Work Title 4 Green Publisher 1

Here is SQL Fiddle detailing this example: http://sqlfiddle.com/#!18/60e2cf/6 and I'm using SQL Server 2014.

My question is, is there a more performant way to achieve this?

Thanks


Solution

  • You can try the query below

     SELECT w.*
        , STUFF( ( SELECT ', ' + p.name FROM Publishers p WHERE  p.wid = w.wid FOR XML PATH ('') ), 1, 1, '')  AS publishers
     FROM Works w
     JOIN Publishers p ON p.wid = w.wid
     WHERE NOT EXISTS (
        SELECT 1
        FROM ExcludedPublishers
        WHERE pub_code = p.code
     )