Search code examples
databaserelational

Database Design Relational Algebra query


I have this schema:

Suppliers(sid: integer, sname: string, address: string)
Parts(pid: integer, pname: string, color: string)
Catalog(sid: integer, pid: integer, cost: real)

And this task:

Find the sids of suppliers who supply every part.

What I don't understand is why in this solution we don't work with a negation. I was tempted to put C1.pid <> P.pid instead of C1.pid = P.pid in the end. Can someone explain?

SELECT C.sid
FROM Catalog C
WHERE NOT EXISTS (SELECT P.pid
FROM Parts P
WHERE NOT EXISTS (SELECT C1.sid
FROM Catalog C1
WHERE C1.sid = C.sid
AND C1.pid = P.pid))

Solution

  • Let's say you have 2 parts and 1 supplier. The supplier has both parts. If you join on <>, your innermost subquery will get two rows back: one for the Catalog entry for Part #1 (because Part #1 <> Part #2 is true); and one for the Catalog entry for Part #2 (likewise).

    Your reasoning isn't entirely off, but the way to do that is not to use an inequality, but rather to use an outer join and test for the missing record on the "outer" table:

    SELECT c.sid
    FROM   catalog c
    WHERE  NOT EXISTS
              (SELECT c1.sid
               FROM   catalog c1 LEFT JOIN parts p ON c1.pid = p.pid
               WHERE  c.sid = c1.sid AND p.pid IS NULL)
    

    Personally, I find the nested not exists to be a little confusing and needlessly complex. I would be more likely to solve this problem using count:

    SELECT   c.sid
    FROM     catalog c
    GROUP BY c.sid
    HAVING   COUNT (DISTINCT c.pid) = (SELECT COUNT (*) FROM parts)