Search code examples
sqlsql-match-all

SQL query for selecting products with same ingredients of other products


I have a database that stores products "available on the market" and products "still in development" in two separate tables (market_product and dev_product). A third table (substance) contains all substances a product can made of. Other two tables (marked_product_comp and dev_product_comp) mantains product compositions.

I want to select products still in development that are made of the same ingredients of marketed products.

In the following (simplified) example the query must select product with ID = 2 from dev_product table.

CREATE table market_product (ID SERIAL PRIMARY KEY);
CREATE table dev_product (ID SERIAL PRIMARY KEY);
CREATE table substance (ID SERIAL PRIMARY KEY);
CREATE table market_product_comp (prodID SERIAL, substID SERIAL, PRIMARY KEY(prodID,substID));
CREATE table dev_product_comp (devID SERIAL, substID SERIAL, PRIMARY KEY(devID,substID));

INSERT INTO market_product VALUES (1),(2);
INSERT INTO dev_product VALUES (1),(2);
INSERT INTO substance VALUES (1),(2),(3);
INSERT INTO market_product_comp VALUES (1,1),(1,2),(2,3);
INSERT INTO dev_product_comp VALUES (1,2),(2,1),(2,2);

How to write such query?


UPDATE:

Sorry, I haven't noticed I asked my question in an ambiguous way.

I want to select products still in development that have the same composition of at least one marketed product. For example, if there is a dev_product made by substances {1,2} and only one market_product made by substances {1,2,3}, I want to discard that dev_product, because it has a different composition. I hope this clarify.


Solution

  • In MySQL:

    SELECT  *
    FROM    dev_product dp
    WHERE   EXISTS
            (
            SELECT  NULL
            FROM    market_product mp
            WHERE   NOT EXISTS
                    (
                    SELECT  NULL
                    FROM    dev_product_comp dpc
                    WHERE   dpc.prodID = dp.id
                            AND NOT EXISTS
                            (
                            SELECT  NULL
                            FROM    market_product_comp mpc
                            WHERE   mpc.prodID = mp.id
                                    AND mpc.substID = dpc.substID
                            )
                    )
                    AND NOT EXISTS
                    (
                    SELECT  NULL
                    FROM    market_product_comp mpc
                    WHERE   mpc.prodID = mp.id
                            AND NOT EXISTS
                            (
                            SELECT  NULL
                            FROM    dev_product_comp dpc
                            WHERE   dpc.prodID = dp.id
                                    AND dpc.substID = mpc.substID
                            )
                    )
    
            )
    

    In PostgreSQL:

    SELECT  *
    FROM    dev_product dp
    WHERE   EXISTS
            (
            SELECT  NULL
            FROM    market_product mp
            WHERE   NOT EXISTS
                (
                SELECT  NULL
                FROM    (
                    SELECT  substID
                    FROM    market_product_comp mpc
                    WHERE   mpc.prodID = mp.ID
                    ) m
                FULL OUTER JOIN
                    (
                    SELECT  substID
                    FROM    dev_product_comp dpc
                    WHERE   dpc.devID = dp.ID
                    ) d
                ON  d.substID = m.substID
                WHERE   d.substID IS NULL OR m.substID IS NULL
                )
            )
    

    Neither from these queries uses COUNT(*): it's enough to find but a single non-matching component to stop evaluating the whole pair.

    See these entries in my blog for explanations: