Search code examples
sqlpostgresqlroutescombinatoricsproduct-variations

SQL: Update with all possible combinations


I have a relation

+-----+----+
| seq | id |
+-----+----+
|   1 | A1 |
|   2 | B1 |
|   3 | C1 |
|   4 | D1 |
+-----+----+

and want to join it in PostgreSQL with

+----+-------+
| id | alter |
+----+-------+
| B1 | B2    |
| D1 | D2    |
+----+-------+

so I get all possible combinations of replacement (i.e. the Cartesian product of replacing more or less). So group 1 has no update,group 2 only B2, group 3 only D2 and group 4 both B2 and D2.

The end should look like this, but should be open to more (like an extra D3 for D1)

+-------+-----+----+
| group | seq | id |
+-------+-----+----+
|     1 |   1 | A1 |
|     1 |   2 | B1 |
|     1 |   3 | C1 |
|     1 |   4 | D1 |
|     2 |   1 | A1 |
|     2 |   2 | B2 |
|     2 |   3 | C1 |
|     2 |   4 | D1 |
|     3 |   1 | A1 |
|     3 |   2 | B1 |
|     3 |   3 | C1 |
|     3 |   4 | D2 |
|     4 |   1 | A1 |
|     4 |   2 | B2 |
|     4 |   3 | C1 |
|     4 |   4 | D2 |
+-------+-----+----+

EDIT:

Another possible replacement table could be

+----+-------+
| id | alter |
+----+-------+
| B1 | B2    |
| D1 | D2    |
| D1 | D3    |
+----+-------+

could should result in 6 groups (I hope I haven't forgot a case)

+-------+-----+----+
| group | seq | id |
+-------+-----+----+
|     1 |   1 | A1 |
|     1 |   2 | B1 |
|     1 |   3 | C1 |
|     1 |   4 | D1 |
|     2 |   1 | A1 |
|     2 |   2 | B2 |
|     2 |   3 | C1 |
|     2 |   4 | D1 |
|     3 |   1 | A1 |
|     3 |   2 | B2 |
|     3 |   3 | C1 |
|     3 |   4 | D2 |
|     4 |   1 | A1 |
|     4 |   2 | B2 |
|     4 |   3 | C1 |
|     4 |   4 | D3 |
|     5 |   1 | A1 |
|     5 |   2 | B1 |
|     5 |   3 | C1 |
|     5 |   4 | D2 |
|     6 |   1 | A1 |
|     6 |   2 | B1 |
|     6 |   3 | C1 |
|     6 |   4 | D3 |
+-------+-----+----+

If you have instead three replacements like

+----+-------+
| id | alter |
+----+-------+
| B1 | B2    |
| C1 | C2    |
| D1 | D3    |
+----+-------+

It'll result in 8 groups. What I tried so far was not really helpful:

WITH a as (SELECT * FROM (values (1,'A1'),(2,'B1'), (3,'C1'), (4,'D1')   ) as a1(seq, id) )
, b as (SELECT * FROM (values ('B1','B2'), ('D1','D2')) as b1(id,alter) )
---------
SELECT row_number() OVER (PARTITION BY a.id) as g, * FROM 
a
CROSS JOIN  b as b1
CROSS JOIN  b as b2
LEFT JOIN b as b3 ON a.id=b3.id
ORDER by g,seq;

Solution

  • answer updated after edit to question

    The tricky part in this problem is to generate the powerset of the replacements. However, luckily postgres supports recursive queries & powersets can be computed recursively. Thus we can build a general solution to this problem that will work regardless of the size of your replacement set.

    Lets call the first table source, the 2nd table replacements, and I'll avoid the unsavory name alter for something else:

    CREATE TABLE source (seq, id) as (
      VALUES (1, 'A1'), (2, 'B1'), (3, 'C1'), (4, 'D1')
    );
    CREATE TABLE replacements (id, sub) as (
      VALUES ('B1', 'B2'), ('D1', 'D2')
    );
    

    First powerset of the ids to be replaced needs to be generated. The null-set may be omitted since that won't work with joins anyhow & at the end the source table can be union'ed to the intermediate result to provide the same output.

    In the recursive step, the the JOIN condition rec.id > repl.id ensures that each id is present only once for each generated subset.

    In the final step:

    the cross join fans out the source N times, where N is the number of non-empty combinations of replacements (with variations)

    the group names are generated using a filtered runnign sum on seq.

    the subsets are unnested & the ids replaced using a coalesce if the replacement id equals the source id.

    WITH RECURSIVE rec AS (
      SELECT ARRAY[(id, sub)] subset, id FROM replacements
      UNION ALL
      SELECT subset || (repl.id, sub), repl.id 
      FROM replacements repl 
      JOIN rec ON rec.id > repl.id
    )
    SELECT NULL subset, 0 set_name, seq, id FROM source
    UNION ALL
    SELECT subset
    , SUM(seq) FILTER (WHERE seq = 1) OVER (ORDER BY subset, seq) set_name 
    , seq
    , COALESCE(sub, source.id) id
    FROM rec 
    CROSS JOIN source
    LEFT JOIN LATERAL (
      SELECT id, sub 
      FROM unnest(subset) x(id TEXT, sub TEXT)
      ) x ON source.id = x.id;
    

    Tests

    With replacement values ('B1', 'B2'), ('D1', 'D2'), the query returns 4 groups.

            subset         | set_name | seq | id 
    -----------------------+----------+-----+----
                           |        0 |   1 | A1
                           |        0 |   2 | B1
                           |        0 |   3 | C1
                           |        0 |   4 | D1
     {"(B1,B2)"}           |        1 |   1 | A1
     {"(B1,B2)"}           |        1 |   2 | B2
     {"(B1,B2)"}           |        1 |   3 | C1
     {"(B1,B2)"}           |        1 |   4 | D1
     {"(D1,D2)"}           |        2 |   1 | A1
     {"(D1,D2)"}           |        2 |   2 | B1
     {"(D1,D2)"}           |        2 |   3 | C1
     {"(D1,D2)"}           |        2 |   4 | D2
     {"(D1,D2)","(B1,B2)"} |        3 |   1 | A1
     {"(D1,D2)","(B1,B2)"} |        3 |   2 | B2
     {"(D1,D2)","(B1,B2)"} |        3 |   3 | C1
     {"(D1,D2)","(B1,B2)"} |        3 |   4 | D2
    (16 rows)
    

    With replacement values ('B1', 'B2'), ('D1', 'D2'), ('D1', 'D3'), The query returns 6 groups:

            subset         | set_name | seq | id 
    -----------------------+----------+-----+----
                           |        0 |   1 | A1
                           |        0 |   2 | B1
                           |        0 |   3 | C1
                           |        0 |   4 | D1
     {"(B1,B2)"}           |        1 |   1 | A1
     {"(B1,B2)"}           |        1 |   2 | B2
     {"(B1,B2)"}           |        1 |   3 | C1
     {"(B1,B2)"}           |        1 |   4 | D1
     {"(D1,D2)"}           |        2 |   1 | A1
     {"(D1,D2)"}           |        2 |   2 | B1
     {"(D1,D2)"}           |        2 |   3 | C1
     {"(D1,D2)"}           |        2 |   4 | D2
     {"(D1,D2)","(B1,B2)"} |        3 |   1 | A1
     {"(D1,D2)","(B1,B2)"} |        3 |   2 | B2
     {"(D1,D2)","(B1,B2)"} |        3 |   3 | C1
     {"(D1,D2)","(B1,B2)"} |        3 |   4 | D2
     {"(D1,D3)"}           |        4 |   1 | A1
     {"(D1,D3)"}           |        4 |   2 | B1
     {"(D1,D3)"}           |        4 |   3 | C1
     {"(D1,D3)"}           |        4 |   4 | D3
     {"(D1,D3)","(B1,B2)"} |        5 |   1 | A1
     {"(D1,D3)","(B1,B2)"} |        5 |   2 | B2
     {"(D1,D3)","(B1,B2)"} |        5 |   3 | C1
     {"(D1,D3)","(B1,B2)"} |        5 |   4 | D3
    (24 rows)
    

    With replacement values ('B1', 'B2'), ('C1', 'C2'), ('D1', 'D2'), The query returns 8 groups:

                 subset              | set_name | seq | id 
    ---------------------------------+----------+-----+----
                                     |        0 |   1 | A1
                                     |        0 |   2 | B1
                                     |        0 |   3 | C1
                                     |        0 |   4 | D1
     {"(B1,B2)"}                     |        1 |   1 | A1
     {"(B1,B2)"}                     |        1 |   2 | B2
     {"(B1,B2)"}                     |        1 |   3 | C1
     {"(B1,B2)"}                     |        1 |   4 | D1
     {"(C1,C2)"}                     |        2 |   1 | A1
     {"(C1,C2)"}                     |        2 |   2 | B1
     {"(C1,C2)"}                     |        2 |   3 | C2
     {"(C1,C2)"}                     |        2 |   4 | D1
     {"(C1,C2)","(B1,B2)"}           |        3 |   1 | A1
     {"(C1,C2)","(B1,B2)"}           |        3 |   2 | B2
     {"(C1,C2)","(B1,B2)"}           |        3 |   3 | C2
     {"(C1,C2)","(B1,B2)"}           |        3 |   4 | D1
     {"(D1,D2)"}                     |        4 |   1 | A1
     {"(D1,D2)"}                     |        4 |   2 | B1
     {"(D1,D2)"}                     |        4 |   3 | C1
     {"(D1,D2)"}                     |        4 |   4 | D2
     {"(D1,D2)","(B1,B2)"}           |        5 |   1 | A1
     {"(D1,D2)","(B1,B2)"}           |        5 |   2 | B2
     {"(D1,D2)","(B1,B2)"}           |        5 |   3 | C1
     {"(D1,D2)","(B1,B2)"}           |        5 |   4 | D2
     {"(D1,D2)","(C1,C2)"}           |        6 |   1 | A1
     {"(D1,D2)","(C1,C2)"}           |        6 |   2 | B1
     {"(D1,D2)","(C1,C2)"}           |        6 |   3 | C2
     {"(D1,D2)","(C1,C2)"}           |        6 |   4 | D2
     {"(D1,D2)","(C1,C2)","(B1,B2)"} |        7 |   1 | A1
     {"(D1,D2)","(C1,C2)","(B1,B2)"} |        7 |   2 | B2
     {"(D1,D2)","(C1,C2)","(B1,B2)"} |        7 |   3 | C2
     {"(D1,D2)","(C1,C2)","(B1,B2)"} |        7 |   4 | D2
    (32 rows)