sqldb2

SQL puzzle - insert missing rows for all the binary features


I'm trying to solve a puzzle but can't get my head around it. The solution should preferably be in DB2, but other SQL variants are also acceptable.

Let's say we have this table listing owners and their cars (each owner can have more than one car):

car_owners:

Owner Make Model Address Phone
1 Toyota Corolla first street 11111
1 Ford Mustang first street 11111
2 Toyota Corolla second street 22222
2 Audi A7 second street 22222
2 Porsche 911 second street 22222
3 Toyota Prius third street 33333

And then we have this table with car features and their codes:

car_features_codes:

Code Feature
1 sedan
2 not sedan
3 ABS
4 no ABS
5 automatic
6 not automatic
7 gasoline
8 no gasoline

Note that this table has 4 features each with a "yes" or "no" code, thus having a total of 8 codes, and these codes are sequential numbers.

For better visualization, the above table could be presented like this:

Feature Yes No
Sedan 1 2
ABS 3 4
Automatic 5 6
Gasoline 7 8

And, finally, we have this table of car features:

car_features:

Owner Make Model Feature
1 Ford Mustang 2
1 Ford Mustang 3
1 Ford Mustang 6
1 Ford Mustang 7
3 Toyota Prius 1
3 Toyota Prius 5
2 Porsche 911 2
2 Porsche 911 3
2 Porsche 911 5
2 Porsche 911 7

Note that of all the 4 features, the Mustang and the Porsche has all of them, the Prius has only 2, and the other cars don't have any.

Now, the challenge is to fill in the last table with all the features of all the cars of the first table, using the "negative" codes of the second table as default. For example: if a car has no code 7 nor 8, then fill in with 8 (no gasoline). If 5 or 6 are missing, fill in with 6. If 1 or 2 are missing, fill in with 2. Etc.

The final "car_features" table should have a total of X*4 rows (X being the number of cars of the first table, each car having all the 4 features), preserving the rows that are already in the table.

The final table should look like this:

Owner Make Model Feature
1 Ford Mustang 2
1 Ford Mustang 3
1 Ford Mustang 6
1 Ford Mustang 7
3 Toyota Prius 1
3 Toyota Prius 5
2 Porsche 911 2
2 Porsche 911 3
2 Porsche 911 5
2 Porsche 911 7
3 Toyota Prius 4
3 Toyota Prius 8
1 Toyota Corolla 2
1 Toyota Corolla 4
1 Toyota Corolla 6
1 Toyota Corolla 8
2 Toyota Corolla 2
2 Toyota Corolla 4
2 Toyota Corolla 6
2 Toyota Corolla 8
2 Audi A7 2
2 Audi A7 4
2 Audi A7 6
2 Audi A7 8

Note: the first 10 rows are the same as before. Then I completed with the other features missing for all cars, each with the "negative" option by default, like, if there's no ABS indicator (no "yes" or "no" codes), then I fill with "no ABS" ("no" code), and repeat for all the other features.

I have tried multiple queries using the INSERT statement, but can't figure out how to do it.


Solution

  • To generate rows that do not exist a CROSS JOIN (Cartesian product) of 2 sets can be used. Here we need the existing car owners cross joined to negative features. This ensures that all owners and all negative codes exist in the result.

    As there is a postive and matching negative feature we need to ensure that if either of a pair exist in the current table that we don't negate the existing positive or vice-versa. So, we left join that result to the existing car owner information to identify any missing rows, but also check that the data does not already exist using not exists:

    INSERT INTO car_features (Owner, Make, Model, Feature)
    WITH all_combinations AS (
        SELECT o.Owner, o.Make, o.Model, c.Code as Feature
        FROM car_owners o
        CROSS JOIN car_features_codes c
        WHERE c.Code IN (2,4,6,8)
    ),
    existing_features AS (
        SELECT o.Owner, o.Make, o.Model, f.Feature, CASE WHEN f.Feature % 2 = 0 THEN f.Feature - 1 ELSE f.Feature + 1 END as OppositeFeature
        FROM car_owners o
        JOIN car_features f ON o.Owner = f.Owner AND o.Make = f.Make AND o.Model = f.Model
    )
    SELECT ac.Owner, ac.Make, ac.Model, ac.Feature
    FROM all_combinations ac
    WHERE NOT EXISTS (
        SELECT 1 
        FROM existing_features ef 
        WHERE ac.Owner = ef.Owner
        AND ac.Make = ef.Make 
        AND ac.Model = ef.Model
        AND (ac.Feature = ef.Feature OR ac.Feature = ef.OppositeFeature)
        )
    

    rows after the insets:

    +-------+---------+---------+---------+
    | OWNER |  MAKE   |  MODEL  | FEATURE |
    +-------+---------+---------+---------+
    |     1 | Ford    | Mustang |       2 |
    |     1 | Ford    | Mustang |       3 |
    |     1 | Ford    | Mustang |       6 |
    |     1 | Ford    | Mustang |       7 |
    |     1 | Toyota  | Corolla |       2 |
    |     1 | Toyota  | Corolla |       4 |
    |     1 | Toyota  | Corolla |       6 |
    |     1 | Toyota  | Corolla |       8 |
    |     2 | Audi    | A7      |       2 |
    |     2 | Audi    | A7      |       4 |
    |     2 | Audi    | A7      |       6 |
    |     2 | Audi    | A7      |       8 |
    |     2 | Porsche | 911     |       2 |
    |     2 | Porsche | 911     |       3 |
    |     2 | Porsche | 911     |       5 |
    |     2 | Porsche | 911     |       7 |
    |     2 | Toyota  | Corolla |       2 |
    |     2 | Toyota  | Corolla |       4 |
    |     2 | Toyota  | Corolla |       6 |
    |     2 | Toyota  | Corolla |       8 |
    |     3 | Toyota  | Prius   |       1 |
    |     3 | Toyota  | Prius   |       4 |
    |     3 | Toyota  | Prius   |       5 |
    |     3 | Toyota  | Prius   |       8 |
    +-------+---------+---------+---------+
    

    See DB2 fiddle

    nb this solution assumes the 8 codes are in pairs and that odd integers are the positives and "+1" will be the matching negative. If this is not true, then you may need to provide a "mapping table" of positive to negative of each basic feature.