Search code examples
sqlsqlitecommon-table-expressionmultiple-inheritance

How to retrieve the properties stored in SQL with multiple inheritance


I'm storing the records in SQL that represent a multiple inheritance relationship similar to the one in C++. Like that:

CREATE TABLE Classes 
(
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL
);

CREATE TABLE Inheritance 
(
    class_id INTEGER NOT NULL,
    base_class_id INTEGER NOT NULL,
    FOREIGN KEY (class_id) REFERENCES Classes(id),
    FOREIGN KEY (base_class_id) REFERENCES Classes(id)
);

The classes have properties of two types. These properties are inherited by the classes, but in different ways. The first type type of property whenever defined for the class overrides the value of the same property used in any of base classes. The other type accumulates the value: the property is actually a set of values, each class inherits all values of it's base classes, plus may add an additional (single) value to this set:

CREATE TABLE OverridableValues
(
    class_id INTEGER PRIMARY KEY,
    value TEXT NOT NULL,
    FOREIGN KEY (class_id) REFERENCES Classes(id)
);

CREATE TABLE AccumulableValues
(
    class_id INTEGER PRIMARY KEY,
    value TEXT NOT NULL,
    FOREIGN KEY (class_id) REFERENCES Classes(id)
);

The caveat with OverridableValues: there are no cases when the same property is overridden on different paths of multiple inheritance.

I'm trying to design queries using common table expressions that would return the value/values for a given property and class.

The approach that I'm trying to use is to start from the root (assume for simplicity that there is a single root class), and then to build the tree of paths from the root to every other class. The problem is how to pass the information about properties from the parents to children. For example below is an incorrect attempt to do that:

WITH ParentProperty (id, value) AS 
(
    SELECT c.id, a.value
    FROM Classes c
    LEFT JOIN AccumulableValues a
      ON a.class_id = c.id
    WHERE c.id = 1 --This is the root

    UNION ALL

    SELECT i.class_id, IFNULL(a.value, ba.value)
    FROM ParentProperty p
    JOIN Inheritance i
      ON i.base_class_id = p.id
    LEFT JOIN AccumulableValues a
      ON a.class_id = i.class_id
    LEFT JOIN AccumulableValues ba
      ON ba.class_id = i.base_class_id
)
SELECT id, value
FROM ParentProperty;

I feel like I need one more UNION ALL inside the CTE, which is not allowed. But without it I either miss proper values or inherited ones. So far I've failed to design the query for both types of properties.

I'm using SQLite as my database engine.


Solution

  • Finally I've found a solution. I'm describing it below, but more efficient ones are still welcomed.

    Let's start with the Accumulable property. My problem was that I tried to add more than one UNION ALL into a single CTE. I've solved that with adding additional CTE (see the AcquiresFrom)

    WITH AcquiresFrom (class_id, from_class_id, value) AS
    (
        SELECT a.class_id, a.class_id, a.value
        FROM AccumulatableValues a
    
        UNION ALL
    
        SELECT i.class_id, i.base_class_id, NULL
        FROM Inheritance i
    ),
    ClassProperty (class_id, value) AS
    (
        SELECT c.id, NULL
        FROM Classes c
        LEFT JOIN Inheritance i
          ON i.class_id = c.id
        WHERE i.base_class_id IS NULL
    
        UNION ALL
    
        SELECT a.class_id, IFNULL(a.value, p.value)
        FROM ClassProperty p
        JOIN AcquiresFrom a
          ON (a.from_class_id = p.class_id AND a.from_class_id != a.class_id) OR
             (a.class_id = p.class_id AND a.class_id = a.from_class_id AND p.value IS NULL)
    )
    SELECT DISTINCT class_id, value
    FROM ClassProperty
    WHERE value IS NOT NULL
    ORDER BY class_id;
    

    The AcquiresFrom means the way to aquire the value: the class either introduces a new value (the first clause) or to inherits it (the second clause). The ClassProperty incrementally propagates the values from base classes to derived. The only thing left to do is to eliminate duplicates and NULL values (the last clause SELECT DISTINCT / WHERE value IS NOT NULL).

    The overridable property is more complex.

    WITH Roots (id, value) AS
    (
        SELECT c.id, o.value
        FROM Classes c
        LEFT JOIN Inheritance i
          ON i.class_id = c.id
        LEFT JOIN OverridableValues o
          ON o.class_id = c.id
        WHERE i.base_class_id IS NULL
    ),
    PossibleValues (id, acquired_from_id, value) AS 
    (
        SELECT r.id, r.id, r.value
        FROM Roots r
    
        UNION ALL
    
        SELECT i.class_id, CASE WHEN o.value IS NULL THEN p.acquired_from_id ELSE i.class_id END, IFNULL(o.value, p.value)
        FROM PossibleValues p
        JOIN Inheritance i
          ON i.base_class_id = p.id
        LEFT JOIN OverridableValues o
          ON o.class_id = i.class_id
    ),
    Split (class_id, base_class_id, direct) AS (
        SELECT i.class_id, i.base_class_id, 1
        FROM Inheritance i
        UNION ALL
        SELECT i.class_id, i.base_class_id, 0
        FROM Inheritance i
    ),
    Ancestors (id, ancestor_id) AS (
        SELECT r.id, NULL
        FROM Roots r
    
        UNION ALL
    
        SELECT s.class_id, CASE WHEN s.direct == 1 THEN a.id ELSE a.ancestor_id END
        FROM Ancestors a
        JOIN Split s
          ON s.base_class_id = a.id
    )
    SELECT DISTINCT p.id, p.value
    FROM PossibleValues p
    WHERE p.acquired_from_id NOT IN
    (
        SELECT a.ancestor_id
        FROM PossibleValues p1
        JOIN PossibleValues p2
          ON p2.id = p1.id
        JOIN Ancestors a
          ON a.id = p1.acquired_from_id AND a.ancestor_id = p2.acquired_from_id
        WHERE p1.id = p.id
    );
    

    The Roots is obviously the list of classes that have no parents. The PossibleValues CTE propagates/overrides the values from roots to final classes, and breaks multiple inheritance cycles making the structure a tree-like. All valid id/value pairs are present in the result of this query, however some invalid values are present as well. These invalid values are those that were overridden on one of the branches, but this fact is not known on another branch. The acquired_from_id allows us to reconstruct who was that class that first introduced this value (that may be useful whenever two different classes intruduce the same value).

    The last thing left is to resolve the ambiguity caused by multiple inheritance. Knowing the class and two possible values we need to know whether one value overrides the other. That is resolved with the Ancestors expression.