Search code examples
sqlsubqueryexistscorrelated-subquery

SQL EXISTS returns all rows, more than two tables


I know similar questions like this have been asked before, but I have not seen one for more than 2 tables. And there seems to be a difference.

I have three tables from which I need fields, customers where I need customerID and orderID from, orders from which I get customerID and orderID and lineitems from which I get orderID and quantity (= quantity ordered).

I want to find out how many customers bought more than 2 of the same item, so basically quantity > 2 with:

SELECT COUNT(DISTINCT custID) 
FROM customers
WHERE EXISTS(
    SELECT *
    FROM customers C, orders O, lineitems L
    WHERE C.custID = O.custID AND O.orderID = L.orderID AND L.quantity > 2
    );

I do not understand why it is returning me the count of all rows. I am correlating the subqueries before checking the >2 condition, am I not?

I am a beginner at SQL, so I'd be thankful if you could explain it to me fundamentally, if necessary. Thanks.


Solution

  • You don't have to repeat customers table in the EXISTS subquery. This is the idea of correlation: use the table of the outer query in order to correlate.

    SELECT COUNT(DISTINCT custID) 
    FROM customers c
    WHERE EXISTS(
        SELECT *
        FROM orders O
        JOIN lineitems L ON O.orderID = L.orderID
        WHERE C.custID = O.custID AND L.quantity > 2
        );