Search code examples
mysqlsqlrelational-division

Why are double nested NOT EXISTS statements unavoidable in SQL


This is more out of curiosity/scientific interest than based on a real problem.

A programming language is supposed to be a tool and tools are made to make working easier. So why is it that you can find all entries in a one table by simply doing SELECT * FROM foo WHERE bar=42; but as soon as there are multiple tables involved, there is no easy/intuitive way to say "find all tuples meeting this criteria"?

The example is about components which are shipped to different projects in different cities. Primary keys in bold.

component(**CNR**,Cname,Color,Weight,City)
project(**PNR**,Pname,City)
scp(**SNR**,**CNR**,**PNR**,Quantity)

The task is to write a query to find all components that are/were shipped to all projects in one specific city.

The given solution:

SELECT CNR
FROM components
WHERE NOT EXISTS (
    SELECT 1
    FROM project
    WHERE project.city = 'Foobar Town'
    AND NOT EXISTS (
        SELECT 1
        FROM scp
        WHERE scp.PNR = projekt.PNR
        AND scp.CNR = components.CNR ));

It is correct, makes sense and even works. But it is not intuitive--a straightforward statement of "get me this". Why?

The example and all the material in our course is based on SQL92.


Solution

  • Your question is: "Find all components that are/were shipped to all projects in one specific city." You are rephrasing this as "Find all components where there is no project in a given city that doesn't have the component."

    I'd be more inclined to answer this directly:

    select scp.component
    from scp join
         projects p
         on scp.pnr = p.pnr
    where p.city = 'Foobar Town'
    group scp.component
    having count(distinct scp.pnr) = (select count(distinct pnr)
                                      from projects
                                      where city = 'Foobar Town'
                                     );
    

    This counts the number of distinct projects in the city and compares them to the number of projects in the city (the distinct id probably not necessary in the subquery.

    First, I'm not sure if this is simpler. Second, I'm the first to admit that the NOT EXISTS method may be more efficient, although the nesting of NOT EXISTS in subqueries may be detrimental to performance. I do, however, think that this is easier to follow.