Search code examples
sqlsql-serverself-joincorrelated-subquery

SQL Server - Correlated Subqueries / Self Join


A select statement returning the name, city, and state of each vendor that's located in a unique city and state (i.e., excludes vendors that have same city and state with another vendor)

SELECT 
    VendorName, VendorCity, VendorState
FROM 
    Vendors
WHERE 
    VendorState + VendorCity NOT IN (SELECT VendorState + VendorCity
                                     FROM Vendors
                                     GROUP BY VendorState + VendorCity
                                     HAVING COUNT(*) > 1)
ORDER BY 
    VendorState, VendorCity;

Alternate answer

SELECT 
    VendorName, VendorCity, VendorState
FROM 
    Vendors AS Vendors_Main
WHERE 
    VendorCity + VendorState NOT IN (SELECT VendorCity + VendorState
                                     FROM Vendors AS Vendors_Sub
                                     WHERE Vendors_Sub.VendorID <> Vendors_Main.VendorID)
ORDER BY 
    VendorState, VendorCity;

I understand the first answer, but not the alternate query. Point of confusion: wouldn't the line below return 0 rows since they're referencing the same table without an additional where clause?

WHERE Vendors_Sub.VendorID <> Vendors_Main.VendorID)

Solution

  • WHERE Vendors_Sub.VendorID <> Vendors_Main.VendorID) doesn't compare the same row of the same table.

    A correlated subquery is logically executed once for each row in the outer query, in your case it checks for rows with the same VendorCity/ VendorState combination, but different VendorIDs.

    In fact I would prefer a direct translation into a correlated NOT EXISTS:

    SELECT VendorName, VendorCity, VendorState
    FROM Vendors AS Vendors_Main
    WHERE NOT EXISTS
     (
       SELECT *
       FROM Vendors AS Vendors_Sub
       WHERE Vendors_Sub.VendorCity  = Vendors_Main.VendorCity  -- same city
         AND Vendors_Sub.VendorState = Vendors_Main.VendorState -- same state
         AND Vendors_Sub.VendorID   <> Vendors_Main.VendorID    -- different vendor
     )
    ORDER BY VendorState, VendorCity;
    

    This prevents false positives like 'state' + 'acity' vs. 'statea' + 'city' which both concatenate to 'stateacity' and works for any kind of datatype.